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Preface 


Combinatorics,  the  mathematics  of  the  discrete,  has  blossomed  in  this  generation.  On  the  the- 
oretical side,  a  variety  of  tools,  concepts  and  insights  have  been  developed  that  allow  us  to  solve 
previously  intractable  problems,  formulate  new  problems  and  connect  previously  unrelated  topics. 
On  the  applied  side,  scientists  from  physicists  to  biologists  have  foimd  combinatorics  essential  in 
their  research.  In  all  of  this,  the  interaction  between  computer  science  and  mathematics  stands  out 
as  a  major  impetus  for  theoretical  developments  and  for  applications  of  combinatorics.  This  text 
provides  an  introduction  to  the  mathematical  foundations  of  this  interaction  and  to  some  of  its 
results. 

Advice  to  Students 


This  book  does  not  assume  any  previous  knowledge  of  combinatorics  or  discrete  mathematics.  Except 
for  a  few  items  which  can  easily  be  skipped  over  and  some  of  the  material  on  "generating  functions" 
in  Part  IV,  calculus  is  not  required.  What  is  required  is  a  certain  level  of  ability  or  "sophistication" 
in  dealing  with  mathematical  concepts.  The  level  of  mathematical  sophistication  that  is  needed  is 
about  the  same  as  that  required  in  a  solid  beginning  calculus  course. 

You  may  have  noticed  similarities  and  differences  in  how  you  think  about  various  fields  of 
mathematics  such  as  algebra  and  geometry.  In  fact,  you  may  have  found  some  areas  more  interesting 
or  more  difficult  than  others  partially  because  of  the  different  thought  patterns  required.  The  field 
of  combinatorics  will  also  require  you  to  develop  some  new  thought  patterns.  This  can  sometimes 
be  a  difficult  and  frustrating  process.  Here  is  where  patience,  mathematical  sophistication  and  a 
willingness  to  ask  "stupid  questions"  can  all  be  helpful. 

Combinatorics  differs  as  much  from  mathematics  you  are  likely  to  have  studied  previously  as 
algebra  differs  from  geometry.  Some  people  find  this  disorienting  and  others  find  it  fascinating.  The 
introductions  to  the  parts  and  to  the  chapters  can  help  you  orient  yourself  as  you  learn  about 
combinatorics.  Don't  skip  them. 

Because  of  the  newness  of  much  of  combinatorics,  a  significant  portion  of  the  material  in  this  text 
was  only  discovered  in  this  generation.  Some  of  the  material  is  closely  related  to  current  research. 
In  contrast,  the  other  mathematics  courses  you  have  had  so  far  probably  contained  little  if  anything 
that  was  not  known  in  the  Nineteenth  Century.  Welcome  to  the  frontiers! 

The  Material  in  this  Book 


Combinatorics  is  too  big  a  subject  to  be  done  justice  in  a  single  text.  The  selection  of  material  in  this 
text  is  based  on  the  need  to  provide  a  solid  introductory  course  for  our  students  in  pure  mathematics 
and  in  mathematical  computer  science.  Naturally,  the  material  is  also  heavily  influenced  by  our  own 
interests  and  prejudices. 

Parts  I  and  II  deal  with  two  fundamental  aspects  of  combinatorics:  enumeration  and  graph 
theory.  "Enumeration"  can  mean  either  counting  or  listing  things.  Mathematicians  have  generally 
limited  their  attention  to  counting,  but  listing  plays  an  important  role  in  computer  science,  so  we 
discuss  both  aspects.  After  introducing  the  basic  concepts  of  "graph  theory"  in  Part  II,  we  present 
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a  variety  of  applications  of  interest  in  computer  science  and  mathematics.  Induction  and  recursion 
play  a  fundamental  role  in  mathematics.  The  usefulness  of  recursion  in  computer  science  and  in 
its  interaction  with  combinatorics  is  the  subject  of  Part  III.  In  Part  IV  we  look  at  "generating 
functions,"  a  powerful  tool  for  studying  counting  problems.  We  have  included  a  variety  of  material 
not  usually  found  in  introductory  texts: 

•  Trees  play  an  important  role.  Chapter  3  discusses  decision  trees  with  emphasis  on  ranking  and 
unranking.  Chapter  9  is  devoted  to  the  theory  and  application  of  rooted  plane  trees.  Trees 
have  many  practical  applications,  have  an  interesting  and  accessible  theory  and  provide  solid 
examples  of  inductive  proofs  and  recursive  algorithms. 

•  Software  and  network  sorts  are  discussed  in  Chapter  8.  We  have  attempted  to  provide  the 
overview  and  theory  that  is  often  lacking  elsewhere. 

•  Part  IV  is  devoted  to  the  important  topic  of  generating  functions.  We  could  not,  in  good  con- 
science, deny  our  students  access  to  the  more  combinatorial  approaches  to  generating  functions 
that  have  emerged  in  recent  years.  This  necessitated  a  longer  treatment  than  a  quick  ad  hoc 
treatment  would  require.  Asymptotic  analysis  of  generating  functions  presented  a  dilemma.  On 
the  one  hand,  it  is  very  useful;  while  on  the  other  hand,  it  cannot  be  done  justice  without  an 
introductory  course  in  complex  analysis.  We  chose  a  somewhat  uneasy  course:  In  the  last  sec- 
tion we  presented  some  rules  for  analysis  that  usually  work  and  can  be  understood  without  a 
knowledge  of  complex  variables. 

Planning  a  Course 


A  variety  of  courses  can  be  based  on  this  text.  Depending  on  the  material  covered,  the  pace  at  which 
it  is  done  and  the  level  of  rigor  required  of  the  students,  this  book  could  be  used  in  a  challenging 
lower  division  course,  in  an  upper  division  course  for  engineering,  science  or  mathematics  students, 
or  in  a  beginning  graduate  course.  There  are  a  number  of  possibilities  for  choosing  material  suitable 
for  each  of  these  classes.  A  graduate  course  could  cover  the  entire  text  at  a  leisurely  pace  in  a  year 
or  at  a  very  fast  pace  in  a  semester.  Here  are  some  possibilities  for  courses  with  a  length  of  one 
semester  to  two  quarters,  depending  on  how  much  parenthesized  optional  material  is  included.  Parts 
of  an  optional  chapter  can  also  be  used  instead  of  the  entire  chapter. 

•  A  lower  division  course:  1,  2.1-2.3,  (2.4),  3.1,  (4.1),  5.1,  (5.2),  5.3-5.5,  (6),  7.1,  7.2,  (7.3),  (8), 
9.1,  (9.2). 

•  An  upper  division  or  beginning  graduate  course  emphasizing  mathematics:  1-3,  4.1,  (4.2),  4.3, 
5,  6.1,  (6.2-6.4),  7,  (8)  9.1,  (9.2-9.3),  10,  (11). 

•  An  upper  division  or  beginning  graduate  course  emphasizing  computer  science:  1-3,  4.1,  5,  6.1, 
6.3,  (6.4),  (6.5),  7,  8,  (9.1),  9.2,  9.3,  10,  (11.4). 

Asterisks,  or  stars,  (*)  appear  before  various  parts  of  the  text  to  help  in  course  design.  Starred 
exercises  are  either  more  difficult  than  other  exercises  in  that  section  or  depend  on  starred  material. 
Starred  examples  are  generally  more  difficult  than  other  material  in  the  chapter.  A  section  or  chapter 
that  is  not  as  central  as  the  rest  of  the  material  is  also  starred.  The  material  in  Part  IV,  especially 
parts  of  Chapter  11,  is  more  difficult  than  the  rest  of  the  text. 

Special  thanks  are  due  Fred  Kochman  whose  many  helpful  comments  have  enhanced  the  readabil- 
ity of  this  manuscript  and  reduced  its  errors.  This  manuscript  was  developed  using  l^iX,  Donald  E. 
Knuth's  impressive  gift  to  technical  writing. 


PART  I 

Counting  and  Listing 


Enumerative  combinatorics  deals  with  listing  and  counting  the  elements  in  finite  sets.  Why  is 
this  of  interest?  Determining  the  number  of  elements  in  a  finite  set  is  the  fundamental  tool  in  the 
analysis  of  the  running  times  and  space  demands  of  computer  algorithms.  It  is  also  of  importance  in 
various  areas  of  science,  most  notably  statistical  mechanics  and  structural  chemistry,  and  in  some 
areas  of  mathematics.  Listing  elements  plays  a  role  in  the  design  of  algorithms  and  in  structural 
chemistry  as  well  as  other  areas.  In  addition,  the  questions  may  be  interesting  in  themselves;  that  is, 
some  people  find  questions  of  the  form  "How  many  structures  are  there  such  that  . . .  ?"  interesting. 

In  this  part  we'll  study  some  fundamental  counting  and  listing  techniques.  These  tools  are  useful 
throughout  combinatorics  and  many  of  them  are  essential  for  other  topics  that  are  covered  later  in 
the  text.  In  particular,  the  Rules  of  Sum  and  Product  form  the  basis  of  practically  all  of  our  counting 
calculations.  The  set  theoretic  notation  and  terminology  that  we  introduce  in  the  first  two  chapters 
is  also  important  for  the  remainder  of  the  text. 

Here  are  some  examples  of  the  types  of  problems  that  our  tools  will  enable  us  to  solve  system- 
atically. 

Example  1  Birthdays  There  are  30  students  in  a  classroom.  What  is  the  probability  that  all 
of  them  have  different  birthdays?  Let's  assume  that  people  are  equally  likely  to  be  born  on  each  day 
of  the  year  and  that  a  year  has  365  days.  (Neither  assumption  is  quite  correct.)  Suppose  we  could 
determine  the  number  of  possible  ways  birthdays  could  be  assigned  to  30  people,  say  N,  and  the 
number  of  ways  this  could  be  done  so  that  all  the  birthdays  are  different,  say  D.  Our  answer  would 
heD/N.  □ 

Example  2  Names  In  a  certain  land  on  a  planet  in  a  galaxy  far  away  the  alphabet  contains 
only  5  letters  which  we  will  transliterate  as  A,  I,  L,  S  and  T  in  that  order.  All  names  are  6  letters 
long,  begin  and  end  with  consonants  and  contain  two  vowels  which  are  not  adjacent  to  each  other. 
Adjacent  consonants  must  be  different.  How  many  possible  names  are  there?  Devise  a  systematic 
method  for  listing  them  in  dictionary  order.  The  list  begins  with  LALALS,  LALALT,  LALASL, 
LALAST,  LALATL,  LALATS,  LALILS  and  ends  with  TSITAT,  TSITIL,  TSITIS,  TSITIT.  □ 

Example  3  Data  storage  An  ecologist  plans  to  simulate  on  a  computer  the  growth  of  a  bi- 
ological community  containing  6  competing  species.  He  plans  to  try  numerous  variations  in  the 
environment.  After  each  simulation,  he'll  order  the  species  from  most  abrmdant  to  least  abundant. 
He  wants  to  keep  track  of  how  often  each  ordering  occurs  and,  after  his  simulations  are  over,  manip- 
ulate the  collection  of  counts  in  various  ways.  How  can  he  index  his  storage  in  a  compact  manner?  Q 

Example  4  Arrangements  We  have  32  identical  dominoes  with  no  marks  on  them  and  a  chess- 
board. Each  domino  will  exactly  cover  two  squares  of  the  chessboard.  How  many  ways  can  we  arrange 
the  dominoes  so  that  they  cover  the  entire  board?  Q 
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Example  5  Symmetries  We  have  a  cube,  some  red  paint  and  some  green  paint.  How  many 
different  ways  can  we  paint  the  cube  so  that  each  face  is  either  all  red  or  all  green?  Obtain  a  list  of 
all  the  different  ways.  A  first  approach  to  this  problem  might  be  to  consider  coloring  the  first  face 
red  or  green,  then  the  second  red  or  green  and  so  on  until  we  have  a  list  of  all  the  possibilities.  This 
ignores  an  important  fact  about  the  problem:  a  cube  looks  the  same  from  many  points  of  view;  i.e., 
it  has  symmetries.  For  example,  there  is  only  one  way  to  color  a  cube  so  that  it  has  one  red  face  and 
five  green  ones.  Q 

In  the  next  four  chapters,  we'll  study  some  fundamental  techniques  for  counting  and  listing 
striicturcs.  What  do  we  mean  by  "structures?"  It  is  simply  a  general  term  for  whatever  things  we 
are  counting.  We  chose  it  rather  than  "thing"  or  "object"  to  emphasize  that  what  we  are  counting 
has  some  internal  organization.  If  the  things  we  were  trying  to  list  or  count  did  not  have  some  sort 
of  internal  organization,  we  would  have  no  way  to  systematically  analyze  them.  A  list  of  30  distinct 
birthdays  or  a  cube  with  colored  faces  are  two  examples  of  structures.  Such  a  list  has  an  internal 
organization:  For  the  birthdays,  we  have  30  distinct  days  of  the  year  written  in  some  order.  A  cube 
has  considerable  structure  because  it  can  be  rotated  in  various  ways  and  end  up  occupying  the  same 
space. 

Understanding  the  internal  organization  clearly  is  the  first  step  in  solving  a  counting  or  listing 
problem.  For  the  birthdays,  the  organization  is  a  sequence  of  30  distinct  numbers  between  1  and  365 
inclusive.  For  the  cube,  the  organization  is  somehow  tied  to  the  cube's  symmetries.  We'll  easily  see 
how  to  answer  the  birthday  question  thanks  to  the  simple  description  of  the  structure's  organization. 
We  have  problems  with  the  cube  because  we  haven't  yet  come  up  with  a  clear  description. 

In  Chapter  1  we'll  study  some  simple  structures — ordered  and  unordered  lists  with  and  without 
repetitions — and  we'll  introduce  tools  for  counting  them.  The  main  tools  are  the  Rules  of  Sum  and 
Product.  Recursions  and  generating  functions  will  also  appear  briefly.  We'll  return  to  generating 
functions  in  Part  IV. 

In  Chapter  2  we'll  study  functions  and  "permutations."  Besides  being  of  interest  in  themselves, 

functions  provide  another  way  to  look  at  the  material  in  Chapter  1,  and  permutations  arc  essential 
for  Chapter  4.  We  conclude  Chapter  2  with  a  discussion  of  Boolean  functions  and  combinatorial 
logic. 

It  is  frequently  necessary  to  generate  combinatorial  objects  or  accumulate  information  about 
them  rather  than  just  count  them.  In  Chapter  3  you'll  see  how  to  use  the  function  viewpoint  and 
"trees"  to  generate  lists  of  structures.  Furthermore,  we'll  study  how  to  access  particular  items  in 
the  list  using  "ranking."  This  is  what  we  need  for  the  biologist's  problem.  Trees  are  an  important 
structure  in  computer  science,  so  you'll  encounter  them  again  in  this  book. 

In  Chapter  4  we'll  study  two  rather  unrelated  topics  that  did  not  merit  a  separate  chapter. 
The  first  topic  is  counting  and  listing  structures  with  symmetries.  Our  earlier  notion  of  permutation 
provides  the  foundation  for  this  discussion.  The  second  topic  is  a  method  of  counting  structures  in 
a  somewhat  indirect  manner,  called  the  "Principle  of  Inclusion  and  Exclusion." 

Preliminary  Reading 


At  various  points  in  our  discussion  we  will  need  to  make  use  of  proof  by  induction.  In  fact,  induction 
is  a  more  common  proof  technique  in  combinatorics  than  in  most  other  branches  of  mathematics. 
We  recommend  that  you  review  proof  by  induction  in  Appendix  A  (p.  361). 

At  times  we  will  estimate  values  using  "Big-Oh"  and  "little-oh"  notation  as  well  as  the  notation 
/(n)  ~  g{n)-  These  are  discussed  in  Section  B.l  (p.  368).  You  may  wish  to  look  at  this  section 
quickly  now  and  refer  back  to  it  as  needed. 

Since  probability  is  a  natural  adjunct  to  counting,  we'll  encounter  it  from  time  to  time  in  the 
examples  and  homework.  The  necessary  background  is  reviewed  in  Appendix  C  (p.  381).  You  should 
look  this  over  now  and  refer  back  to  it  as  needed. 
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The  algebraic  rules  for  operating  with  sets  are  also  familiar  to  most  beginning  university  students. 
Here  is  such  a  list  of  the  basic  rules.  In  each  case  the  standard  name  of  the  rule  is  given  first,  followed 
by  the  rule  as  applied  first  to  D  and  then  to  U. 

Theorem  0.1  Algebraic  rules  for  sets  The  universal  set  U  is  not  mentioned  explicitly 
but  is  implicit  when  we  use  the  notation  ~X  =  U  —  X  tor  the  complement  of  X.  An  alternative 
notation  is  X°  = 

Associative:  {P  n  Q)  (1  R  =  P  Ci  {Q  n  R)  {P  U  Q)  U  R  =  P  Li  {Q  Li  R) 

Distributive:  P  D  {Q  U  R)  =  {P  n  Q)  U  {P  Ci  R)    P  LI  {Q  n  R)  =  {P  U  Q)  n  {P  LI  R) 

Idempotent:  PnP  =  P  PUP  =  P 

Double  Negation:    ~~P  =  P 

DeMorgan:  ~(P  n  Q)  =  ~P  U  ~(5  ~(P  U  Q)  =  ~P  n 

Absorption:  P  U  (P  n  Q)  =  P  P  n  (P  U  Q)  =  P 

Commutative:        PnQ  =  QnP  PuQ  =  QUP 


CHAPTER  1 

Basic  Counting 


Introduction 


Before  beginning,  we  must  confront  some  matters  of  notation.  Two  words  that  we  shall  often  use 
axe  set  and  list.  Both  words  refer  to  collections  of  objects.  There  is  no  standard  notation  for  lists. 
Some  of  those  in  use  are 

apple  banana  pear  peach  a  list  of  four  items  . . . 

apple,  banana,  pear,  peach  commas  added  for  clarity  . . . 

and    (apple,  banana,  pear,  peach)  parentheses  added. 

The  notation  for  sets  is  standard:  the  items  are  separated  by  commas  and  surround  by  curly  brackets 
as  in 

{apple,  banana,  pear,  peach}. 

The  curly  bracket  notation  for  sets  is  so  well  established  that  you  can  normally  assume  it  means  a 
set — but  beware,  Mathematica®  uses  curly  brackets  for  lists. 

What  is  the  difference  between  a  set  and  a  list?  Quite  a  bit,  and  nothing.  "Set"  means  a  collection 
of  distinct  objects  in  which  the  order  doesn't  matter.  Thus 

{apple,  peach,  pear}    and    {peach,  apple,  pear} 

are  the  same  sets,  and  the  set  {apple,  peach,  apple}  is  the  same  as  the  set  {apple,  peach}.  In  other 

words,  repeated  elements  are  treated  as  if  they  occurred  only  once.  Thus  two  sets  are  the  same  if 
and  only  if  each  element  that  is  in  one  set  is  in  both.  In  a  list,  order  is  important  and  repeated 
objects  are  usually  allowed.  Thus 

(apple,  peach)    (peach,  apple)    and    (apple,  peach,  apple) 

are  three  different  lists.  Two  lists  are  the  same  if  and  only  if  they  have  exactly  the  same  items  in 
exactly  the  same  positions.  Thus,  sets  and  lists  are  different. 

On  the  other  hand,  people  talk  about  things  like  "unordered  hsts,"  "sets  with  repetition,"  and 
so  on.  In  fact,  a  set  with  repetition  is  so  common  that  it  has  a  name:  multiset.  Two  multisets  are 
the  same  if  and  only  if  each  item  that  occurs  exactly  k  times  in  one  of  them  occurs  exactly  k  times 
in  both.  In  summary 

•  list:  an  ordered  sequence  (repeats  allowed), 

•  set:  a  collection  of  distinct  objects  where  order  does  not  matter, 

•  multiset:  a  collection  of  objects  (repeats  allowed)  where  order  does  not  matter. 

Thus,  an  ordered  set  with  repetition  allowed  is  a  list  and  an  unordered  list  of  distinct  elements  is 
a  set.  Whenever  we  refer  to  a  list,  we  will  indicate  whether  the  elements  must  be  distinct.  Unless  we 


5 


6       Chapter  1    Basic  Counting 


say  otherwise,  a  list  is  ordered.  An  ordered  list  is  sometimes  called  a  string,  a  sequence  or  a  word.  A 
list  is  also  called  a  sample  or  a  selection,  especially  in  probability  and  statistics.  Lists  are  sometimes 
called  vectors  and  the  elements  components. 

The  terminology  "fc-list"  is  frequently  used  in  place  of  the  more  cumbersome  "A;  long  list." 
Similarly,  we  use  fc-set  and  fc-multiset.  Vertical  bars  (also  used  for  absolute  value)  are  used  to  denote 
the  number  of  elements  in  a  set  or  in  a  list.  For  example,  if  S  is  an  n-sct,  then  \S\  =  n. 

We  want  to  know  how  many  ways  we  can  do  various  things  with  a  set.  Here  are  some  examples, 
which  we  illustrate  by  using  the  set  S  =  {x,  y,  z}. 

1.  How  many  ways  can  we  list,  without  repetition,  all  the  elements  of  5?  This  means,  how  many 
ways  can  we  arrange  the  elements  of  S  in  an  (ordered)  list  so  that  each  clement  of  S  appears 
exactly  once  in  each  of  the  lists.  For  the  illustration,  there  are  six  ways:  xyz,  xzy,  yxz,  yzx,  zxy 
and  zyx.  (These  are  all  called  permutations  of  S.  People  often  use  Greek  letters  like  tt  and  a  to 
indicate  a  permutation  of  a  set.) 

2.  How  many  ways  can  wc  constrTict  a  A:- list  of  distinct  elements  from  the  set?  When  k  =  \S\.  this 
is  the  previous  question.  If  A:  =  2  in  the  illustration,  there  are  six  ways:  xy,  xz,  yx,  yz,  zx  and  zy. 

3.  If  the  list  in  the  previous  question  is  allowed  to  contain  repetitions,  what  is  the  answer?  There 

are  nine  ways  for  the  illustration:  xx.  xy.  xz.  yx,  yy,  yz,  zx,  zy  and  zz. 

4.  If,  in  Questions  2  and  3,  the  order  in  which  the  elements  appear  in  the  list  doesn't  matter,  what 
are  the  answers?  For  the  illustration,  the  answers  are  three  and  six,  respectively. 

5.  How  many  ways  can  the  set  S  be  partitioned  into  a  collection  of  k  pairwise  disjoint  nonempty 
smaller  sets?  With  fc  =  2,  the  illustration  has  three  such:  {{x},{y,z}},  {{x,y},{z}}  and 
{{x,z},{y}}. 

We'll  learn  how  to  answer  these  questions  without  going  through  the  time-consuming  process  of 
constructing  (listing)  all  the  items  in  question  as  we  did  for  our  illustration.  Our  answer  to  the  last 
question  will  be  somewhat  unsatisfactory.  Other  answers  to  it  will  be  discussed  in  later  chapters. 

1.1    Lists  with  Repetitions  Allowed 


How  many  ways  can  we  construct  a  fc-list  (repeats  allowed)  using  an  n-set?  Look  at  our  illustration 

in  Question  3  above.  The  first  entry  in  the  list  could  be  x,  y  or  z.  After  any  of  these  there  were  three 
choices  {x,  y  or  z)  for  the  second  entry.  Thus  there  are  3x3  =  9  ways  to  construct  such  a  list.  The 
general  pattern  should  be  clear:  There  are  n  ways  to  choose  each  list  entry.  Thus 

Theorem  1.1    There  are  n''  ways  to  construct  a  k-list  from  an  n-set. 

This  calculation  illustrates  an  important  principle: 

Theorem  1.2    Rule  of  Product     Suppose  structures  are  to  be  constructed  by  making  a 

sequence  of  k  choices  such  that,  (i)  the  ith  choice  can  be  made  in  Cj  ways,  a  number  independent 
of  what  choices  were  made  previously,  and  (ii)  each  structure  arises  in  exactly  one  way  in  this 
process.  Then,  the  number  of  structures  is  ci  x  ■  ■  ■  x  Ck. 

"Structures"  as  used  above  can  be  thought  of  simply  as  elements  of  a  set.  We  prefer  the  term 
structures  because  it  emphasizes  that  the  elements  are  built  up  in  some  way;  in  this  case,  by  making 
a  sequence  of  choices.  In  the  previous  calculation,  the  structures  are  lists  of  k  things  which  are 
built  up  by  adding  one  thing  at  a  time.  Each  thing  is  chosen  from  a  given  set  of  n  things  and 
Ci  =  C2  =  . . .  =  Ck  =  n. 


1.1     Lists  with  Repetitions  Allowed 


7 


Definition  1.1    Cartesian  Product   IfCi,...,Ck  are  sets,  the  Cartesian  product  of  tie 

sets  is  written  Ci  x  ■  ■  ■  x  Ck  and  consists  of  all  k-lists  {xi, . . .  ,Xk)  with  Xi  G  Ci  tor  1  <i  <  k. 

A  special  case  of  the  Rule  of  Product  is  the  fact  that  the  number  of  elements  in  Ci  x  •  •  •  x  Cfc  is 
the  product  |Ci|  •  •  •  |Cfc|.  Here  Ci  is  the  collection  of  ith  choices  and  Ci  =  \Ci\.  This  is  only  a  special 
case  because  the  Rule  of  Product  would  allow  the  collection  Ci  to  depend  on  the  previous  choices 
Xi, . . . ,  Xi^i  as  long  as  the  number  Ci  of  possible  choices  does  not  depend  on  Xi, . . . .  Xi^i.  The  last 
example  in  Appendix  A  gives  a  proof  of  this  special  case  of  the  Rule  of  Product.  In  fact,  that  proof 
can  be  altered  to  give  a  proof  of  the  general  case  of  the  Rule  of  Product.  We  will  not  do  so. 

Here  is  a  property  associated  with  Cartesian  products  that  we  will  find  useful  in  our  later 
discussions. 

Definition  1.2  Lexicographic  Order  ltd,...,  Ck  are  ordered  lists  of  distinct  elements, 
we  may  think  of  them  as  sets  and  form  the  Cartesian  product  P  =  Ci  x  ■  ■  ■  x  Ck-  The  lexico- 
graphic order  on  P  is  defined  by  saying  that  ai . .  .Uk  <  bi. .  .bk  if  and  only  if  there  is  some 
t  <  k  such  that     =  bi  for  i  <t  and  at  <bt- 

Often  we  say  lex  order  instead  of  lexicographic  order.  If  all  the  Cj's  equal  (0,1,2,3,4,5,6,7,8,9), 
then  lex  order  is  simply  numerical  order  of  k  digit  integers  with  leading  zeroes  allowed.  Suppose  that 
all  the  CiS  equal  ( <spac;c>,  A,  B, . . . ,  Z).  If  we  throw  out  those  elements  of  P  that  have  a  letter 
following  a  space,  the  result  is  dictionary  order.  Unlike  these  two  simple  examples,  the  Cj's  usually 
vary  with  i. 

Example  1.1  A  simple  count  The  North-South  streets  in  Rectangle  City  are  named  using  the 
numbers  1  through  12  and  the  East- West  streets  are  named  using  the  letters  A  through  H.  Thus, 
the  most  southwesterly  intersection  occurs  where  First  and  A  streets  meet.  How  many  blocks  are 
within  the  city? 

We  may  think  of  the  city  of  as  consisting  of  rows  of  blocks.  Each  row  contains  the  blocks 
encountered  as  we  cross  the  city  from  East  to  West.  The  number  of  rows  is  the  number  of  rows  of 
blocks  encountered  as  we  cross  the  city  from  North  to  South.  This  is  much  like  the  rows  and  columns 
of  a  matrix.  We  can  apply  the  Rule  of  Product:  Choose  a  row  and  then  choose  a  block  in  that  row. 
What  answer  does  this  give?  If  you  think  it  is  12  x  8  =  96,  you're  almost  correct.  Read  on. 

Each  block  can  be  labeled  by  the  streets  at  its  southwesterly  corner.  These  labels  have  the  form 
(x,  y)  where  x  is  between  1  and  11  inclusive  and  y  is  between  A  and  C.  (If  you  don't  see  why  12 
and  H  are  missing,  draw  a  picture  and  look  at  southwesterly  corners.)  By  the  Rule  of  Product  there 
are  11  x  7  =  77  blocks.  In  this  case  the  structures  can  be  taken  to  be  the  descriptions  of  the  blocks. 
Each  description  has  two  parts:  the  names  of  the  north-south  and  East-West  streets  at  the  block's 
southwest  corner.  □ 

Example  1.2    Counting  names    We  now  return  to  the  faraway  galaxy  that  was  mentioned  in 

Example  2  (p.  1). 

The  possible  positions  for  the  two  vowels  are  (2, 4),  (2, 5)  and  (3,  5).  Each  of  these  results  in  two 
isolated  consonants  and  two  adjacent  consonants.  Thus  the  answer  is  the  product  of  the  following 
factors: 

•  choose  the  vowel  locations  (3  ways); 

•  choose  the  vowels  (2x2  ways); 

•  choose  the  isolated  consonants  (3x3  ways); 

•  choose  the  adjacent  consonants  (3x2  ways). 
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The  answer  is  648.  This  construction  can  be  interpreted  as  a  Cartesian  product  as  follows.  Ci  is  the 
set  of  lists  of  possible  positions  for  the  vowels,  C2  is  the  set  of  lists  of  vowels  in  those  positions,  and 
C3  and  C4  are  sets  of  lists  of  consonants.  Thus 

Ci  =  {(2, 4),  (2, 5),  (3, 5)}  C2  =  {AA,AI,IA,II} 

C3  =  {LL,LS,LT,SL,SS,ST,TL,TS,TT}        C4  =  {LS,LT,SL,ST,TL,TS}. 

For  example,  ((2,5),  lA,  SS,  ST)  in  the  Cartesian  product  corresponds  to  the  word  SISTAS.  □ 
Here's  another  important  principle,  the  proof  of  which  is  self  evident: 

Theorem  1.3  Rule  of  Sum  Suppose  a  set  T  of  structures  can  be  partitioned  into  sets 
Ti,. . .  ,Tj  so  that  each  structure  in  T  appears  in  exactly  one  Ti,  then 

\T\  =  |Ti|  +  ...  +  |T,-|. 

Example  1.3  Counting  names  (revisited)  We'll  redo  the  previous  example  using  this  prin- 
ciple. 

The  possible  vowel  (V)  and  consonant  (C)  patterns  for  names  are  CCVCVC,  CVCCVC 
and  CVCVCC.  Since  these  patterns  are  disjoint  and  cover  all  cases,  we  must  compute  the  num- 
ber of  names  of  each  type  and  add  the  results  together.  For  the  first  pattern  we  have  a  product  of 
six  factors,  one  for  each  choice  of  a  letter:  3x2x2x3x2x3  =  216.  The  other  two  patterns  also 
give  216,  for  a  total  of  648  names. 

This  approach  has  a  wider  range  of  applicability  than  the  method  we  used  in  the  previous  exam- 
ple. We  were  only  able  to  avoid  the  Rule  of  Sum  in  the  first  method  because  each  pattern  contained 
the  same  number  of  vowels,  isolated  consonants  and  adjacent  consonants.  Here's  an  example  that 
requires  the  Rule  of  Sum.  Suppose  a  name  consists  of  only  four  letters,  namely  two  vowels  and  two 
consonants,  constructed  so  that  the  vowels  are  not  adjacent  and,  if  the  consonants  arc  adjacent, 
then  they  are  different.  There  are  four  patterns:  CVCV,  VCVC,  VCCV.  By  the  Rule  of  Product,  the 
first  two  are  each  associated  with  36  names,  but  VCCV  is  associated  with  only  24  names  because  of 
the  adjacent  consonants.  Hence,  we  cannot  choose  a  pattern  and  then  proceed  to  choose  vowels  and 
consonants.  On  the  other  hand,  we  can  apply  the  Rule  of  Sum  to  get  a  total  of  96  names.  □ 

Example  1.4  Smorgasbord  College  committees  Smorgasbord  College  has  four  depart- 
ments which  have  6,  35,  12  and  7  faculty  members.  The  president  wishes  to  form  a  faculty  judicial 
committee  to  hear  cases  of  student  misbehavior.  To  avoid  the  possibility  of  ties,  the  committee  will 
have  three  members.  To  avoid  favoritism  the  committee  members  will  be  from  different  departments 
and  the  committee  will  change  daily.  If  the  committee  only  sits  during  the  normal  academic  year 
(165  days),  how  many  years  can  pass  before  a  committee  must  be  repeated? 

If  T  is  the  set  of  all  possible  committees,  the  answer  is  |T|/165.  Let  Tj  be  the  set  of  committees 
with  no  members  from  the  ith  department.  By  the  Rule  of  Sum  \T\  =  \Ti\  +  \T2\  +  ITs]  +  \T4\.  By 
the  Rule  of  Product 

|Ti|  =  35  x  12  x  7  =  2940  jTsI  =  35  x  6  x  7  =  1470 

IT2I  =  6  X  12  X  7  =  504  IT4I  =  35  X  12  X  6  =  2520. 

Thus  the  number  of  years  is  7434/165  =  45+.  Due  to  faculty  turnover,  a  committee  need  never 
repeat — if  the  president's  policy  lasts  that  long.  □ 
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Using  the  Rules  of  Sum  and  Product 


Whenever  we  encounter  a  new  technique,  there  are  two  questions  that  arise; 


•  When  is  it  used? 


•  How  is  it  used? 


For  the  Rules  of  Sum  and  Product,  the  answers  are  intertwined: 

Technique  Rules  for  AND  and  OR  Suppose  you  wish  to  count  the  number  of  structures 
in  a  set  and  that  you  can  describe  how  to  construct  the  structures  in  terms  of  subconstructions 
that  arc  connected  by  "ands"  and  "ors."  If  this  leads  to  the  construction  of  each  structure  in  a 
unique  way,  then  the  Rules  of  Sum  and  Product  apply.  To  use  them,  replace  "ands"  by  products 
and  "ors"  by  sums.  Whenever  you  write  something  like  "Do  A  AND  do  B,"  it  should  mean  "Do  A 
AND  THEN  do  B"  because  the  Rule  of  Product  requires  that  the  choices  be  made  sequentially. 
We  will  usually  omit  "then". 

Example  1.5  Applying  the  technique  To  see  how  this  technique  is  applied,  let's  look  back  at 
Example  1.4.  A  committee  consists  of  either 

•  One  person  from  Dept.  1  AND  one  person  from  Dept.  2  AND  one  person  from  Dept.  3,  OR 

•  One  person  from  Dept.  1  AND  one  person  from  Dept.  2  AND  one  person  from  Dept.  4,  OR 

•  One  person  from  Dept.  1  AND  one  person  from  Dept.  3  AND  one  person  from  Dept.  4.  OR 

•  One  person  from  Dept.  2  AND  one  person  from  Dept.  3  AND  one  person  from  Dept.  4. 

The  number  of  ways  to  choose  a  person  from  a  department  equals  the  number  of  people  in  the 
department.  D 

Until  you  bcc;omc  c;omfortablc  using  the  Rules  of  Sum  and  Product,  look  for  "and''  and  "or"  in 
what  you  do.  This  is  an  example  of  the  divide  and  conquer  tactic:  break  the  problem  into  parts  and 
work  on  each  piece  separately.  Here  the  first  part  is  getting  a  phrasing  with  "ands"  and  "ors;"  the 
second  part  is  calculating  each  of  the  individual  pieces;  and  the  third  part  is  applying  the  Rules  of 
Sum  and  Product. 

Example  1.6    Palindromes     A  palindrome  is  a  list  that  reads  the  same  from  right  to  left  as  it 

docs  from  left  to  right.  For  example,  ignoring  capitalization,  punctuation  and  spaces,  "Madam  Fm 
Adam."  becomes  the  palindrome  madamimadam. 

How  many  fc-long  palindromes  can  be  formed  from  an  n-set?  The  first  \k/2]  list  elements  are 
arbitrary  and  the  remaining  elements  arc  determined.*  Thus  the  answer  is  n'^'^/^T. 

Imagine  a  necklace  of  beads  with  a  clasp.  How  many  fc-bead  necklaces  can  be  formed  if  we  are 
given  n  different  colors  of  round  beads.  When  the  necklace  is  worn  we  can  tell  the  end  of  the  necklace 
because  of  the  clasp,  but  we  can't  distinguish  a  left  end  versus  a  right  end.  We  can  think  of  this  as 
fc-long  lists  where  we  consider  two  lists  the  same  if  one  can  be  obtained  from  the  other  by  reversing 
the  list.  If  a  list  is  a  palindrome,  it  contributes  one  to  the  count.  If  a  list  is  not  a  palindrome,  the 
list  and  its  reversal  together  contribute  one  to  the  count. 

Let  p  be  the  number  of  palindrome  lists  and  q  the  number  of  non-palindrome  lists.  We  want 
p+q/2.  The  number  of  lists  is  p  +  q,  which  equals  n'^  and  the  number  of  palindromes  is  n^'^^^'^ .  Thus 


p  +  q  =  n 


k 


and 


p  =  n 


[fe/21 


*  The  notation  [a;]  means  least  integer  not  less  than  x  (that  is,  round  up).  For  example  [tt]  =  4. 
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and  so  q  =  n''  —  n'^'^/'^l .  Finally  we  obtain  our  answer: 

r,         n'^-nT'^/^l  \k/2]  ,  fc 


Example  1.7  Listing  instead  of  counting  Suppose  we  want  to  write  a  program  to  actually 
list  the  things  in  a  set  T  rather  than  just  counting  them.  Instead  of  computing  \T\,  we  have  to 
execute  a  program  that  lists  all  items  t  gT.  What  about  the  Rules  of  Sum  and  Product?  The  Rule 
of  Sum  becomes 

For  each  <i  G  Ti :  list  ti . 
For  each  ^2  6  212:  list  <2  • 

For  each  tj  G  Tj  :  list  tj  . 
The  Rule  of  Product  becomes 

For  each  first  choice  di : 

For  each  fcth  choice  dk : 

List  the  structure  arising  from  the  choices  di,...,dk- 
End  for 

End  for 

This  is  actually  more  general  than  Theorem  1.2  since,  in  the  code,  the  number  of  choices  in  each 
loop  may  depend  on  previous  choices.  See  Chapter  3  for  more  discussion.  □ 


Exercises 


In  each  of  the  exercises,  indicate  how  you  are  using  the  Rules  of  Sum  and  Product.  You  can  do  this  with 
the  AND/OR  technique. 

1.1.1.  How  many  different  three  digit  positive  integers  are  there?  (No  leading  zeroes  are  allowed.)  How 
many  positive  integers  with  at  most  three  digits?  What  are  the  answers  when  "three"  is  replaced  by 

"n?" 

1.1.2.  A  small  neighboring  country  of  the  one  we  revisited  in  Example  1.3  has  the  same  alphabet  and  the 
same  rules  of  formation,  but  names  are  only  five  letters  long.  How  many  names  are  possible? 

1.1.3.  Prove  that  the  number  of  subsets  of  a  set  5",  including  the  empty  set  and  S  itself,  is  2l'^l. 
Hint.  For  each  element  of  S  you  must  make  one  of  two  choices:  "x  is/isn't  in  the  subset." 

1.1.4.  A  composition  of  a  positive  integer  n  is  an  ordered  list  of  positive  integers  (called  parts)  that  sum 
to  71.  The  four  compositions  of  3  arc  3;  2,1;  1,2  and  1,1,1. 

(a)  By  considering  ways  to  insert  plus  signs  and  commas  in  a  list  of  n  ones,  obtain  a  formula  for  the 
number  of  compositions  of  n. 

Hint.  The  four  compositions  above  correspond  to  1+1+1;  1+1,1;  1,1+1  and  1,1,1,  respec- 
tively. 

(b)  Prove  that  the  average  number  of  parts  in  a  composition  of  n  is  (n  +  l)/2. 

Hint.  Reverse  the  roles  of  "+"  and  ","  and  then  look  at  the  number  of  parts  in  the  original  and 
role-reversed  compositions. 
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*1.1.5.  In  Example  1.3  we  found  that  there  were  648  possible  names.  Suppose  that  these  are  listed  in  the 
usual  dictionary  order.  What  is  the  last  word  in  the  first  half  of  the  dictionary  (the  324th  word)?  the 
first  word  in  the  second  half? 
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What  happens  if  we  do  not  allow  repeats  in  our  list?  Suppose  we  have  n  elements  to  choose  from 
and  wish  to  form  a  fc-list  with  no  repeats.  How  many  lists  are  there? 

We  can  choose  the  first  entry  in  the  list  AND  choose  the  second  entry  AND  •  •  •  AND  choose 
the  kth  entry.  There  are  n  —  i  +  1  ways  to  choose  the  ith  entry  since  i  —  1  elements  have  been 
removed  from  the  set  to  make  the  first  part  of  the  list.  By  the  Rule  of  Product,  the  number  of  lists 
is  n(n  —  1)  ■  ■  ■  (n  —  +  1).  Using  the  notation  n\  for  the  product  of  the  first  n  integers  and  writing 
0!  =  1,  you  should  be  able  to  see  that  this  answer  can  be  written  as  n!/(n  —  A:)!,  which  is  often 
designated  by  {n)k  and  called  the  falling  factorial.  We  have  proven 

Theorem  1.4  When  repeats  are  not  allowed,  there  are  n\/{n—  k)\  =  {n)k  k-lists  that  can  be 
constructed  from  an  n-set. 

When  k  =  n,  a,  list  without  repeats  is  simply  a  linear  ordering  of  the  set.  We  frequently  say 
"ordering"  instead  of  "linear  ordering."  An  ordering  is  sometimes  called  a  "permutation"  of  S.  Thus, 
we  have  proven  that  a  set  S  can  be  (linearly)  ordered  in  \S\l  ways. 

Example  1.8  Lists  without  repeats  How  many  lists  without  repeats  can  be  formed  from  a 
5-set?  There  arc  5!  —  120  5-lists  without  repeats,  5!/l!  —  120  4- lists  without  repeats,  5!/2!  =  60 
3-lists,  5!/3!  —  20  2-lists  and  5!/4!  =  5  1-lists.  By  the  Rule  of  Sum,  this  gives  a  total  of  325  lists, 
or  326  if  we  count  the  empty  list.  In  Exercise  1.2.11  you  are  asked  to  obtain  an  estimate  when  "5-set" 
is  replaced  with  "n-sct" . 

Suppose  we  have  a  problem  involving  fc-lists  with  repeats  allowed  and  we  want  the  formula  when 
repeats  are  not  allowed.  Since  allowing  repeats  leads  to  powers  and  forbidding  repeats  leads  to  falling 
factorials,  we  might  try  to  replace  powers  with  falling  factorials.  Doing  this  without  thinking,  can 
easily  give  the  wrong  answers.  Look  back  at  Example  1.6  where  we  needed  to  count  palindromes 
and  obtained  the  formula  p  =  n^^/'^^.  Except  for  1-long  lists,  a  palindrome  has  repeated  elements; 
for  example,  the  first  and  last  elements  arc  equal.  Thus  we  obtain  p  =  n  when  k  =  1  and  p  =  0  when 
A;  >  1  for  palindromes  without  repeats.  □ 

Lists  can  appear  in  many  guises.  In  this  next  example,  the  people  could  be  thought  of  as  the 

positions  in  a  list  and  the  scats  the  things  in  the  list.  Sometimes  it  helps  to  find  a  reinterpretation 
like  this  for  a  problem.  At  other  times  it  is  easier  to  tackle  the  problem  starting  over  again  from 
scratch.  These  methods  can  lead  to  several  approaches  to  a  problem.  That  can  make  the  difference 
between  a  solution  and  no  solution  or  between  a  simple  solution  and  a  complicated  one.  You  should 
practice  using  both  methods,  even  on  the  same  problem. 

Example  1.9    Linear  arrangements    How  many  different  ways  can  100  people  be  arranged  in 

the  seats  in  a  classroom  that  has  exactly  100  seats? 

Each  seating  is  simply  an  ordering  of  the  people.  Thus  the  answer  is  100!.  Simply  writing  100! 
probably  gives  you  little  idea  of  the  size  of  the  number  of  seatings.  A  useful  approximation  for 
factorials  is  given  by  Stirling's  formula: 
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Theorem  1.5  Stirling's  formula  ^/2Tm{n/e)"' approximates  n\  with  a  relative  error  un- 
der 1/lOn. 

We  say  that  f{x)  approximates  g{x)  with  a  relative  error  at  most  d{x)  if  \f{x)/g{x)  —  1|  <  S{x). 
Thus,  the  theorem  states  that  v^Tm  (n/e)"/n!  differs  from  1  by  less  than  1/lOn.  When  relative 
error  is  multiplied  by  100,  wc  obtain  "percentage  error."  If  we  simply  want  to  note  that  the  relative 
error  goes  to  0  as  n  ^  oo,  we  can  write ^ 

n!  ^  ■\/27rn(n/e)"    or,  cquivalcntly.    n\  =  \/27rn(n/e)"(l  +  o(l)). 

This  is  weaker  than  Theorem  1.5  because  o(l)  stands  for  something  that  can  be  replaced  by  some 
function  h{n)  with  lim„^oo  h{n)  =  0,  but  the  theorem  tells  us  more,  namely  the  function  h{n)  is  so 
small  that  \h{n)\  <  l/lOn. 

By  Stirling's  formula,  we  find  that  100!  is  nearly  9.32  x  10^^ '',  which  is  much  larger  than  estimates 
of  the  number  of  atoms  in  the  universe. 

Now  suppose  wc  still  have  100  scats  but  have  only  95  people.  Wc  need  to  think  a  bit  more 
carefully  than  before.  One  approach  is  to  put  the  people  in  some  order  (e.g.,  alphabetical),  select 
a  list  of  95  seats,  and  then  pair  up  people  and  seats  so  that  the  first  person  gets  the  first  seat,  the 
second  person  the  second  scat,  and  so  on.  By  the  general  formula  for  lists  without  repetition,  the 
answer  is  100!/(100  —  95)!  =  100!/120.  We  can  also  solve  this  problem  by  thinking  of  the  people  as 
positions  in  a  list  and  the  seats  as  entries.  Do  it.  □ 

The  next  example  is  starred  because  it  is  above  the  level  of  this  chapter;  therefore  you  may  want 
to  just  skim  it  or  maybe  even  omit  it.  It  illustrates  some  of  the  calculations  that  one  often  runs  into 
in  obtaining  estimates  for  large  values  of  n  and  obtains  the  useful  formula  (1.2). 

*Example  1.10    Estimating  nl / {n  —  k)l     This  example  requires  familiarity  with  the  notations 

0(  )  and  o( ),  which  are  discussed  in  Appendix  B. 

Suppose  we  want  to  estimate  the  number  of  fc-lists  without  repeats  that  can  be  formed  from  an 
n-set;  that  is,  we  want  to  estimate  n\/{n  —  fc)!.  In  this  example,  we're  interested  in  obtaining  the 
estimate  when  n  is  large  and  k  is  much  smaller  than  n.  Of  course,  we  can  use  Stirling's  formula, 
which  gives  us  the  estimate 

„„+l/2g-fc 


(n-fc)!        ^2^(n-fc)((n-fc)/e)"-fe        (n  -  fc)»-'=+i/2  " 
This  is  still  rather  messy.  How  can  we  simphfy  it?  We  have 

„n+l/2  /     „     xn-fe+1/2  .  ,      x  n-fe+1/2 


(n  -  A;)"-'=+V2  yn-kj  \  n-k^ 

We  need  a  result  from  calculus: 

If  a;  is  small,  then  ln(l  +  a;)  =  a;  -  a;^/2  +  0(a;^)  and  so  1  +  a;  =  exp(a;  -  a;^/2  +  0(a;^)).  1.1 

If  you  know  Taylor's  Theorem,  you  should  be  able  to  prove  it;  otherwise,  just  accept  the  result.  Since 
k  is  much  smaller  than  n,        is  small.  Let  it  be  x.  By  (1.1), 

,      xn-fc+l/2  ,  (-Z^Y 

1+^-1  =  eMA(n-k+l/2))       where       A=—^-^"    '  +Oiiklin-k)f). 

n  —  k  J  n  —  k  2 

With  some  algebra  and  the  ability  to  work  with  0( ),  one  can  deduce  that 

exp(^(n  -  A;  +  1/2))  =  exp{k  -  k^ /2n  +  0{k^ /n^)). 


^  The  notation  in  the  next  equations  is  discussed  in  Appendix  B.  It  simply  means  that 
n!  /  (\/27rn(n/e)")  — >  0  as  n  — >  oo. 
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These  manipulations  are  beyond  what  we  expect  of  you  at  this  point,  so  we'll  omit  them — you'll 
have  to  figure  out  how  to  do  them  or  just  accept  this  result. 
Putting  all  this  together: 

nl  ji"+l/2„-fc 

If  fc3  =  o(n2),  then  0{k^/n^)  =  o(l)  and  so  exp(0(fcVn2))  =  e°(i)  ~  1.  Thus  we  have 

— ^  ~  n'=e-'='/2"  provided  k  =  ofn^/^).  1.2 
(n  —  ky. 

For  example,  by  Theorem  1.4,  the  number  of  200- lists  without  repeats  that  can  be  formed  from  a 
10,000-set  is  about  lO^^Ve^.  □ 

Example  1.11   Words  from  a  collection  of  letters    How  many  "words"  of  length  k  can 

be  formed  from  the  letters  in  ERROR  when  no  letter  may  be  used  more  often  than  it  appears  in 
ERROR?  (A  "word"  is  any  list  of  letters,  pronounceable  or  not.)  If  you  are  familiar  with  the  game  of 
Scrabble®,  you  can  imagine  that  you  have  5  tiles,  namely  one  E,  one  O,  and  three  R's.  We  cannot  use 
5*^  since  unlimited  repetition  is  not  allowed.  On  the  other  hand,  we  cannot  use  (5)fe  since  repetition 
is  allowed.  At  present,  all  we  can  do  is  carefully  list  the  possibilities.  Here  they  are  in  alphabetical 
order. 

k  =  l:    E,  O,  R 

k  =  2:    EO,  ER,  OE,  OR,  RE,  RO,  RR 

fc  =  3  :    EOR,  ERO,  ERR,  OER,  ORE,  ORR,  REO,  RER,  ROE,  ROR,  RRE,  RRO,  RRR 

fc  =  4  :    EORR,  EROR,  ERRO,  ERRR,  OERR,  ORER,  ORRE,  ORRR,  REOR,  RERO, 
RERR,  ROER,  RORE,  RORR,  RREO,  RRER,  RROE,  RROR,  RRRE,  RRRO 

A;  =  5  :    EORRR,  ERORR,  ERROR,  ERRRO,  OERRR,  ORERR,  ORRER,  ORRRE, 

REORR,  REROR,  RERRO,  ROERR,  RORER,  RORRE,  RREOR,  RRERO, 

RROER,  RRORE,  RRREO,  RRROE 

This  is  obviously  a  tedious  process — try  it  with  ERRONEOUSNESS.  We  will  explore  better  methods 
in  Examples  1.19,  3.3  (p.  69),  and  11.6  (p.  319).  □ 

Example  1.12  Circular  arrangements  How  many  ways  can  n  people  be  seated  on  a  Ferris 
wheel  with  exactly  one  person  in  each  seat?  Equivalently,  we  can  think  of  this  as  seating  the  people 
at  a  circular  table  with  n  chairs.  Two  seatings  are  defined  to  be  "the  same"  if  one  can  be  obtained 
from  the  other  by  rotating  the  Ferris  wheel  (or  rotating  the  seats  around  the  table). 

If  the  people  were  seated  in  a  straight  line  instead  of  in  a  circle,  the  answer  would  be  n!.  Can  we 
convert  the  circular  seating  into  a  linear  seating  (i.e.,  an  ordered  list)?  In  other  words,  can  we  convert 
the  unsolved  problem  to  a  solved  one?  Certainly — simply  cut  the  circular  arrangement  between  two 
people  and  unroll  it.  Thus,  to  arrange  n  people  in  a  linear  ordering, 

first  arrange  them  in  a  circle    AND    then  cut  the  circle. 

According  to  our  AND/OR  technique,  we  must  prove  that  each  linear  arrangement  arises  in 
exactly  one  way  with  this  process. 

•  Since  a  linear  seating  can  be  rolled  up  into  a  circular  seating,  it  can  also  be  obtained  by  unrolling 
that  circular  seating.  Hence  each  linear  seating  arises  at  least  once. 

•  Since  the  people  at  the  circular  table  arc  all  difi^ercnt.  the  place  we  cut  the  circle  determines 
who  the  first  person  in  the  linear  seating  is,  so  each  cutting  of  a  circular  seating  gives  a  different 
linear  seating.  Obviously  two  different  circular  seatings  cannot  give  the  same  linear  seating. 
Hence  each  linear  seating  arises  at  most  once. 
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Figure  1.1     Some  circular  arrangements  with  the  corresponding  Hnear  arrangements. 


Putting  these  two  observations  together,  we  see  that  each  Unear  seating  arises  exactly  once.  By  the 
Rule  of  Product, 

n!  =  (number  of  circular  arrangements)  x 
(number  of  places  to  cut  the  circle). 

Hence  the  number  of  circular  arrangements  is  n!/n  =  (n  —  1)!. 

Our  argument  was  somewhat  indirect.  We  can  derive  the  result  by  a  more  direct  argument.  For 
convenience,  let  the  people  be  called  1  through  n.  We  can  read  off  the  people  in  the  circular  list 
starting  with  person  1.  This  gives  a  linear  ordering  of  n  that  starts  with  1.  Conversely,  each  such 
linear  ordering  gives  rise  to  a  circular  ordering.  Thus  the  number  of  circular  orderings  equals  the 
number  of  such  linear  orderings.  Having  listed  person  1,  there  are  (n  —  1)!  ways  to  list  the  remaining 
n  —  1  people.  Thus  the  number  of  circular  arrangements  is  (n  —  1)!. 

If  we  are  making  circular  necklaces  using  n  distinct  beads,  then  the  arguments  we  have  just 
given  prove  that  there  arc  (n  —  1)!  possible  necklaces  provided  we  are  not  allowed  to  flip  necklaces 
over.  What  happens  if  the  beads  arc  not  distinct? 

The  direct  method  fails  if  there  are  multiple  copies  of  bead  1  because  we  don't  know  where  to 
start  reading.  What  about  the  indirect  method?  The  different  cuttings  of  the  circular  arrangement 

may  not  be  distinct.  Let's  have  a  look  at  an  example  to  sec  why.  We'll  take  a  circular  arrangement 
with  six  "places"  and  put  beads  of  type  1  and  2  around  the  circle,  where  we  can  use  any  number 
of  each  of  the  two  types  of  beads.  In  Figure  1.1  are  some  distinct  necklaces  and,  next  to  each,  the 

distinct  linear  arrangements  we  get  by  unrolling.  There  are  2^  different  linear  arrangements.  Since 
some  necklaces  have  less  than  six  unroUings,  2^/6  is  an  underestimate  of  the  number  of  necklaces. 

We  can  describe  what  we're  doing  as  follows:  Call  two  lists  (i.e.,  linear  arrangements)  "equiva- 
lent" if  one  can  be  gotten  from  the  other  by  "circularly  permuting"  the  elements;  that  is,  by  shifting 
everything  down  some  fixed  number  of  positions  and  putting  what  is  shifted  off  the  end  at  the  be- 
ginning. The  lists  fall  into  sets  of  equivalent  lists,  each  set  corresponding  to  one  circular  seating. 
Figure  1.1  can  be  thought  of  as  containing  six  such  sets  of  equivalent  lists.  The  number  of  necklaces 
is  the  number  of  sets  of  equivalent  lists. 

Although  we  will  not  study  tools  for  dealing  with  problems  having  equivalences  until  Chapter  4, 
there  is  one  important  class  of  problems  with  equivalences  that  we  can  deal  with  now.  Suppose  we 
allow  the  list  entries  to  be  rearranged  in  any  fashion;  in  other  words,  we  want  to  count  unordered 
lists.  We'll  take  up  this  subject  in  the  next  section.  □ 


1.2     Lists  with  Repetitions  Forbidden 


15 


Our  first  derivation  of  the  formula,  n!/n,  for  seating  n  people  at  a  circular  table  illustrates  an 
important  but  obvious  principle: 

No  matter  how  you  count  a  set,  the  number  is  always  the  same. 

For  circular  arrangements,  we  counted  the  set  of  linear  arrangements  in  two  ways.  Another  obvious 
principle  is 

//  there  is  a  one-to-one  correspondence  between  two  sets,  then  they  are  the  same  size. 

This  can  be  used  to  show  that  two  counting  problems  have  the  same  answer.  In  the  next  example 
we  consider  a  famous  example  of  this — the  Catalan  numbers,  which  arise  in  a  variety  of  counting 
problems. 

Example  1.13  Catalan  numbers  Suppose  we  have  an  election  between  two  candidates  and 
the  ballots  are  counted  one-by-one.  Further  suppose  that  the  first  candidate  is  never  behind  (she's 

always  ahead  or  tied),  but  that  the  final  count  ends  in  a  tie  with  each  candidate  getting  n  votes. 
How  many  ways  can  this  happen?  The  answer  is  called  the  Catalan  number  C^.  We  are  looking  at 
ordered  lists  of  that  contain  n  ones  and  n  twos  such  that,  for  all  k,  the  number  of  twos  in  the  first 
k  elements  is  at  most  k/2.  The  lists  for  n  <  3  are 

12       1122    1212       111222    112122    112212    121122  121212 

and  so  Ci  =  1,  C2  =  2,  C3  =  5.  In  general  C„  =  ■  Wc  won't  derive  the  formula  for  C„  now, 

but  we  want  to  look  at  other  problems  that  have  the  same  answer.  (If  you  look  up  Catalan  numbers 
in  the  index,  you  can  find  a  derivation  of  the  formula  in  the  text  as  well  as  other  problems  that  have 
the  same  answer.) 

In  computer  science  we  have  the  notion  of  a  stack.  This  is  an  ordered  list  with  two  operations: 

•  PUSH:  Add  an  item  to  the  end  of  the  fist. 

•  POP:  Remove  an  item  from  the  end  of  the  list. 

It  is  illegal  to  attempt  to  "POP"  an  empty  stack.  How  many  ways  can  we  start  out  with  an  empty 
stack,  PUSH  and  POP  in  some  order  and  end  up  with  an  empty  stack  at  the  end?  There  must  be 
the  same  number  of  PUSHs  and  POPs.  Suppose  there  are  n  of  each.  You  should  be  able  to  convince 
yourself  that  this  is  the  same  as  the  election  problem  and  so  the  answer  is  Cn- 

Suppose  we  have  n  things  we  want  to  multiply  together.  In  general,  ah  ^  ba  so  order  matters; 
however,  we  can  group  them  in  any  way  we  want.  (This  is  true  if  the  things  being  multiplied  are 
matrices.)  For  example,  here  are  the  ways  we  could  group  four  things  for  multiplication. 

a{b{cd))    a{{bc)d)    {ab){cd)    {a{bc))d  {{ab)c)d. 

We  can  do  this  with  a  stack  using  one  of  two  operations: 

•  STORE:  PUSH  the  next  thing  onto  the  stack 

•  MULT:  POP  two  things  off  the  stack  and  PUSH  their  product  onto  the  stack. 
For  example,  to  do  a{{bc)d)  we  would  do 

STORE,  STORE,  STORE,  MULT,  STORE,  MULT,  MULT. 

There  must  be  n  STOREs  to  get  all  n  items  onto  the  stack.  There  must  be  n  —  1  MULTs.  The 
number  of  STOREs  in  the  first  k  things  must  exceed  the  number  of  MULTs.  (Can  you  sec  why  the 
last  two  statements  are  true?)  Forgetting  the  first  STORE,  this  is  just  the  original  voting  problem 
with  n  —  1  votes  each.  Thus  the  answer  is  C„_i. 

A  regular  n-gon  can  be  cut  up  into  triangles  all  of  whose  vertices  are  vertices  of  the  n-gon.  To 
do  this,  one  must  draw  n  —  3  nonintersecting  diagonals.  We  call  this  a  "triangulation  of  the  n-gon." 
Here  are  the  five  triangulations  of  the  pentagon. 
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be  c 
ah 

ad  d  {ab)c  d 


{{ab)c)d 


b  c 


ab  ab  cd 

d 


(06)  (cd) 


be 

a  d  a  d  a{hc)  d 

{a{bc))d 

Figure  1.2    The  reduction  of  three  of  the  five  triangulations  of  the  pentagon  to  multiplications  of  abed. 


Wc  want  to  know  how  many  triangulations  there  are  for  a  regular  n-gon.  This  is  trickier  than  the 
previous  correspondences.  First,  we  need  to  know  a  little  about  what  the  triangulations  look  like. 

It  turns  out  that,  for  n  >  3,  every  triangulation  has  n  —  3  diagonals,  n  —  2  triangles  and  exactly 
two  triangles  that  contain  two  edges  of  the  original  n-gon.  Actually,  any  of  these  three  claims  can 
be  used  to  prove  the  other  two.  To  see  this,  suppose  there  are  D  diagonals  and  T  triangles.  Then 
the  triangles  have  a  total  of  3T  edges.  These  edges  come  from  the  original  n-gon  and  from  both  sides 
of  the  diagonals.  Thus  3T  =  n  +  2D.  It  is  clear  that  every  triangle  contains  cither  one  or  two  edges 
of  the  n-gon.  Call  the  number  of  these  triangles  Ti  and  T2,  respectively.  Then  Ti  +  T2  =  T  and 
Ti  +  2T2  =  n.  In  summary 

3T^n  +  2D       Ti+T2  =  n       Ti  +  2T2  =  n. 

We  have  three  equations  in  the  four  unknowns  D,  T,  Ti  and  T2.  If  any  of  these  is  known  (e.g., 
£>  =  n  —  3),  we  can  solve  the  equations  for  the  other  three.  Which  value  should  we  determine  so 
that  the  others  can  be  found? 

We'll  prove  that  D  =  n  —  3.  This  is  even  true  for  3-gons  (triangles)  since  no  diagonals  are 
needed.  We'll  use  induction  for  n  >  3.  Suppose  we  are  given  a  triangulation  of  an  n-gon.  Cut  it 
along  any  diagonal  to  split  it  into  two  polygons.  Let  the  number  of  sides  of  the  two  polygons  be  ki 
and  k2-  Since  cutting  along  the  diagonal  has  given  us  two  new  sides,  ki  +  k2  =  n  +  2.  Notice  that  the 
fci-gon  and  fc2-gon  are  triangulated.  By  induction,  the  fci-gon  has  fci  —  3  diagonals  and  the  A;2-gon 
has  k2  —  3.  Thus,  counting  the  diagonal  we  cut  along,  the  number  of  diagonals  in  the  original  n-gon 
triangulation  is 

(fci  -  3)  +  (fc2  -  3)  +  1  =  {ki  +  k2)-5  =  (n  +  2)  -  5  =  n  -  3, 

and  the  induction  is  complete. 

We'll  now  describe  a  method  for  associating  a  multiplication  of  n  —  1  things  with  a  triangulation 
of  an  Tvgon.  Draw  the  n-gon  with  one  side  at  the  bottom.  We'll  call  this  side  the  "base" .  Label  all 
the  sides  except  the  base.  (See  the  left  side  of  Figure  L2.)  There  are  two  triangles  that  have  two 
sides  belonging  to  the  n-gon.  Thus  there  must  be  a  triangle  with  two  labeled  sides.  Remove  the 
labeled  sides  and  place  the  product  of  their  labels  on  the  third  side.  Repeat  this  process  until  we  are 
left  with  a  labeled  base.  Figure  1.2  contains  examples. 

To  complete  the  process  we  need  to  know  that  this  gives  us  a  one-to-one  correspondence  between 
the  triangulations  and  the  multiplications.  Simply  write  the  multiplication  on  the  base  and  reverse 
the  steps.  In  other  words,  read  Figure  1.2  from  right  to  left  instead  of  from  left  to  right.  We  leave  it 
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to  you  to  convince  yourself  that  every  multiplication  leads  to  a  unique  triangulation  and  vice  versa. 
Thus  there  are  C„_2  triangulations  of  a  regular  n-gon. 

Wc  have  looked  at  only  a  few  of  the  dozens  of  combinatorial  interpretations  of  the  Catalan 
numbers.  D 

Exercises 


In  each  of  the  exercises,  indicate  how  you  are  using  the  Rules  of  Sum  and  Product.  It  is  instructive  to  first  do 

these  exercises  using  only  the  techniques  introduced  so  far  and  then,  after  reading  the  next  section,  to  return 
to  these  exercises  and  look  for  other  ways  of  doing  them.  More  generally,  looking  back  at  earlier  sections  to 
get  a  new  viewpoint  is  often  helpful.  We  do  this  in  the  text  to  some  extent,  but  you  should  do  it  on  your 
own,  too. 

1.2.1.  Find  to  two  decimal  places  the  answer  to  the  birthday  question  asked  in  Example  1  (p.  1). 

Hint.  Assigning  birthdays  to  30  people  is  the  same  as  forming  an  ordered  list  of  30  dates. 

1.2.2.  Use  (1.2)  to  estimate  the  solution  to  the  birthday  problem  in  Example  1  (p.  1). 

1.2.3.  How  many  ways  arc  there  to  form  an  ordered  list  of  two  distinct  letters  from  the  set  of  letters  in  the 
word  COMBINATORICS?  three  distinct  letters?  four  distinct  letters? 

1.2.4.  Repeat  the  previous  problem  when  the  letters  need  not  be  distinct  but  cannot  be  used  more  often 
than  they  appear  in  COMBINATORICS. 

1.2.5.  We  are  interested  in  forming  3  letter  words  ("3-words")  using  the  letters  in  LITTLEST.  For  the 

purposes  of  the  problem,  a  "word"  is  any  ordered  list  of  letters. 

(a)  How  many  words  can  be  made  with  no  repeated  letters? 

(b)  How  many  words  can  be  made  with  unlimited  repetition  allowed? 

(c)  How  many  words  can  be  made  if  repeats  are  allowed  but  no  letter  can  be  used  more  often  than 
it  appears  in  LITTLEST? 

1.2.6.  Redo  the  previous  exercise  for  fc-words.  The  last  part  should  be  starred.  It  can  be  done  if  you  treat 
each  value  of  <  8  separately  and  carefully  break  it  down  into  cases  with  OR.  Even  so,  you  should 
study  the  next  section  before  you  attempt  it. 

1.2.7.  Each  of  the  following  belongs  to  one  of  the  four  typos  of  things  described  in  Example  1.13.  In  each 
case,  list  the  other  three  things  that  correspond  to  it  using  the  correspondences  in  the  example. 

(a)  1122112122 

(b)  (a(6c))(((de)/)5) 

(c) 

1.2.8.  Suppose  we  have  an  election  as  in  Example  1.13,  but  now  the  first  candidate  is  always  ahead  except 

for  the  0-0  and  n-n  tics  at  the  start  and  finish.  How  many  ways  can  this  happen? 

1.2.9.  By  2001  spclfing  has  deteriorated  considerably.  The  dictionary  defines  the  spelling  of  "relief"  to  be 
any  combination  (with  repetition  allowed)  of  the  letters  R,  L,  F,  I  and  E  subject  to  certain  constraints 
listed  below  How  many  spellings  are  possible?  The  most  popular  spelling  is  the  one  that,  in  dictionary 
order,  is  five  before  the  spelling  RELIEF.  What  is  it? 

(i)  The  number  of  letters  must  not  exceed  6. 

(ii)  The  word  must  contain  at  least  one  L. 

(iii)  The  word  must  begin  with  an  R  and  end  with  an  F. 

(iv)  There  is  just  one  R  and  one  F. 
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1.2.10.  By  the  year  2010,  further  deterioration  in  spelhng  has  relaxed  the  last  condition  listed  above  so  that 
we  can  have  any  number  of  initial  R's  and  any  number  of  terminal  F's,  provided  there  is  at  least 
one  of  each.  How  many  spellings  are  possible?  Which  spelling  is  five  before  RELIEF  in  dictionary 

order? 

1.2.11.  Prove  that  the  number  of  ordered  lists  without  repeats  that  can  be  constructed  from  an  n-set  is 
very  nearly  n!e.  The  lists  can  be  of  any  length. 

Hint.  Recall  that  from  Taylor's  Theorem  in  calculus      =  1  +  x  +  x 

1.2.12.  In  this  exercise,  we  look  at  ways  of  seating  n  people  at  a  long  table  that  has  n  seats.  In  (c)-(c),  n  is 
even. 

Hint.  If  you  fix  a  corner  of  the  table  and  read  out  the  seating  arrangement  counterclockwise  starting 
at  that  corner,  you  have  an  ordered  list.  If  you  draw  pictures,  you  should  be  able  to  see  how  many 
ordered  lists  give  an  equivalent  seating  arrangement;  for  example,  by  reversing  right  and  left  in  (b). 

(a)  Suppose  that  everyone  is  to  be  seated  on  one  side  of  the  table.  How  many  ways  can  it  be  done? 

(b)  Suppose  wo  don't  care  if  left  and  right  are  interchanged;  that  is,  seating  A,  B,C, .  .  .  from  left  to 
right  will  be  considered  the  same  as  doing  it  from  right  to  left.  (This  is  reasonable  if  all  we  care 
about  is  who  a  person's  neighbors  are.)  How  many  ways  can  this  be  done? 

(c)  How  many  ways  can  it  be  done  if  n  is  even  and  half  the  people  are  seated  on  each  side  of  the 
table?  Assume  that  we  can  tell  the  two  sides  of  the  table  apart;  for  example,  one  side  faces  a 
wall  and  the  other  side  faces  into  the  room.  Also  assume  seating  left  to  right  is  different  from 
seating  right  to  left. 

(d)  Suppose  wc  scat  people  on  both  sides  as  in  (c)  and  all  we  care  about  is  who  a  person's  neighbors 

arc  on  each  side,  as  in  (b). 

(e)  Suppose  we  are  dealing  with  a  seating  as  in  (d),  but  now  we  also  care  about  who  is  sitting 
opposite  a  person  as  well  as  who  a  persons  neighbors  on  each  side  are. 


*1.2.13.  This  exercise  contains  several  related  questions.  In  each  case  we  would  like  a  formula  that  answers 
the  question  "How  many  ways  can  p  people  run  for  k  offices?"  under  the  given  constraints.  Unless 
the  constraints  say  otherwise,  a  person  may  run  for  no  offices.  At  present,  we  have  the  tools  to  do 
only  two  parts  of  this  exercise.  The  challenge  in  this  exercise  is  to  avoid  finding  wrong  "solutions"  to 
the  parts  that  wo  are  unable  to  do,  as  well  as  doing  the  two  parts  wc  can  do  now.  One  way  you  can 
check  your  "solution"  is  to  actually  list  all  the  possible  ways  p  people  can  run  for  k  offices  for  each  of 
the  parts  for  some  small  values  of  p  and  k.  We  will  return  to  this  exercise  later  as  we  develop  tools 
for  doing  other  parts  of  it. 

(a)  Each  person  must  be  a  candidate  for  at  most  one  office. 

(b)  Each  person  must  be  a  candidate  for  exactly  one  office  and  each  office  must  have  at  least  one 
candidate. 

(c)  Each  person  must  be  a  candidate  for  at  most  one  office  and  each  office  must  have  at  least  one 
candidate. 

(d)  Each  person  can  be  a  candidate  for  any  number  of  offices  (including  none)  and  each  office  must 
have  at  least  one  candidate. 

(e)  Each  person  must  be  a  candidate  for  at  least  one  office  and  each  office  must  have  at  least  one 
candidate. 


*1.2.14. 


In  Example  1.12:  How  many  are  there  of  length  3  made  from  A's  and  B's?  Length  5?  Can  you  prove 
a  general  result  for  all  primes?  What  about  allowing  more  than  two  kinds  of  letters? 
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People  use  C(n,  k)  to  stand  for  the  number  of  different  fc-subsets  that  can  be  formed  from  an 
n-set.  The  notation  (^)  is  also  frequently  used.  These  are  called  binomial  coefficients  and  are  read 
"n  choose  fc."  Think  about  how  you  might  count  fc-subsets,  that  is,  unordered  fc-lists. 

*       *       *       Stop  and  think  about  this!        *       *  * 

You  may  have  concluded  that  this  seems  a  bit  trickier  to  do  than  counting  ordered  lists.  Can  we 
rephrase  the  problem  in  a  way  that  lets  us  solve  it,  or  convert  it  to  an  ordered  list  problem? 

•  An  unordered  A;-list  of  distinct  elements  from  a  set  S  is  simply  a  fc-subset  of  S.  This  doesn't 
seem  to  be  of  any  help  at  present;  however,  we  will  generally  think  in  terms  of  subsets  rather 
than  unordered  lists  since  the  subset  view  is  used  more  often  in  the  literature. 

•  If  the  original  set  consisted  of  something  ordered,  like  the  integers,  we  could  introduce  a  "natural" 

ordering  to  an  unordered  list,  namely  the  one  in  which  the  elements  are  in  increasing  order  (or. 
if  you  prefer,  decreasing  order).  Again  this  doesn't  seem  to  help,  but  provides  a  possibly  useful 
interpretation. 

•  We  can  adjust  the  previous  idea  a  bit.  Let's  consider  all  possible  orderings  of  our  lists.  This 
is  a  way  of  constructing  all  ordered  lists  with  distinct  elements  in  two  steps:  First  construct  an 
unordered  Ust  with  no  repeats,  then  order  it.  An  unordered  fc-list  with  no  repeats  is  simply  a 
fc-set.  We  c;an  order  it  by  forming  a  fc-list  without  repeats  from  it.  By  Theorem  1.4  (p.  11),  we 
know  that  this  can  be  done  in  fc!  ways.  By  the  Rule  of  Product,  there  are  C{n,  k)kl  ordered  fc-lists 
with  no  repeats.  By  Theorem  1.4  again,  this  number  is  n{n  —  1)  •  •  •  (n  —  fc  +  1)  =  n!/(n  —  fc)!. 
Dividing  by  fc!,  we  have 

Theorem  1.6   Binomial  coefficient  formula   The  value  of  the  binomial  coefEcients  is 

n\   _  ^,    ,  X  _  n{n  -  1)  •  •  •  (n  -  fc  +  1)  _  n\ 

_  c[n,k)  -  -  -  ^, 


Example  1.14  A  generating  function  for  binomial  coefficients  We'll  now  approach  the 
problem  of  evaluating  C(n,  fc)  in  another  way.  In  other  words,  we'll  "forget"  the  formula  we  just 
derived  and  start  over  with  a  new  approach. 

You  may  ask  "Why  waste  time  using  another  approach  when  we've  already  gotten  what  we 
want?"  We  gave  a  partial  answer  to  this  earlier.  Here  is  a  more  complete  response. 

•  By  looking  at  a  problem  from  different  viewpoints,  we  may  come  to  understand  it  better  and  so 
be  more  comfortable  working  similar  problems  in  the  future. 

•  By  looking  at  a  problem  from  different  viewpoints,  we  may  discover  that  things  we  previously 
thought  were  unrelated  have  interesting  connections.  These  connections  might  open  up  easier 
ways  to  solve  some  types  of  problems  and  may  make  it  possible  for  us  to  solve  problems  we 
couldn't  do  before. 

•  A  different  point  of  view  may  lead  us  to  a  whole  new  approach  to  problems,  putting  powerful 

new  tools  at  our  disposal. 

In  the  approach  we  are  about  to  take,  we'll  begin  to  see  a  powerful  tool  for  solving  counting 
problems.  It's  called  "generating  functions"  and  it  lets  us  put  calculus  and  related  subjects  to  work 
in  combinatorics.  In  later  chapters,  we'll  devote  more  time  to  generating  functions.  Now,  we'll  just 
get  a  brief  glimpse  of  them. 

Suppose  that  S  =  {xi,. . . ,  a;„}  where  xi,  X2,  ...  and  Xn  are  variables  as  in  high  school  algebra. 
Let  P{S)  =  (1  +  a;i)  •  •  •  (1  +  a;„).  The  first  three  values  of  P{S)  are 
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n=  1 
n  =  2 
n  =  3 


1  +  Xi+  X2  +  X1X2 

1  +  a;i  +  a;2  +  a;3  +  X1X2  +  X1X3  +  X2X3  +  X1X2X3. 


Prom  this  you  should  be  able  to  convince  yourself  that  P{S)  consists  of  a  sum  of  terms  where 
each  term  represents  one  of  the  subsets  of  S'  as  a  product  of  its  elements.  Can  we  reach  some 
understanding  of  why  this  is  so?  Yes,  but  we'll  only  explore  it  briefly  now.  The  understanding 
relates  to  the  Rules  of  Sum  and  Product.  Interpret  plus  as  OR,  times  as  AND  and  1  as  "nothing." 
Then  (1  +  xi){l  +  X2){1  +  X3)  can  be  read  as 

•  include  the  factor  1  in  the  term  OR  include  the  factor  xi  AND 

•  include  the  factor  1  in  the  term  OR  include  the  factor  X2  AND 

•  include  the  factor  1  in  the  term  OR  include  the  factor  x^. 

This  is  simply  a  description  of  how  to  form  an  arbitrary  subset  of  {xi,X2,X3}.  On  the  other  hand 
we  can  form  an  arbitrary  subset  by  the  rule 

•  Include  nothing  in  the  subset  OR 

•  include  xi  in  the  subset  OR 

•  include  X2  in  the  subset  OR 

•  include  X3  in  the  subset  OR 

•  include  xi  AND  X2  in  the  subset  OR 

•  include  xi  AND  X3  in  the  subset  OR 

•  include  X2  AND  X3  in  the  subset  OR 

•  include  xi  AND  X2  AND  X3  in  the  subset. 

If  we  drop  the  subscripts  on  the  XiS,  then  a  product  representing  a  fc-subset  becomes  x'^.  We 
get  one  such  term  for  each  subset  and  so  it  follows  that  the  coefficient  of  x'^  in  the  polynomial 

f{x)  =  (1  +  x)"  is  C{n,  k);  that  is, 

n 

(l  +  a;)"  =  ^C{n,k)x''.  1.3 

fc=0 

Can  this  help  us  evaluate  C{n,  k)l  Calculus  comes  to  the  rescue!  Remember  Taylor's  Theorem? 
It  tells  us  that  the  coefficient  of  x'^  in  f{x)  is  /(*')(0)/fc!.  Let  f{x)  =  (1  +a;)".  You  should  be  able  to 
prove  by  induction  on  k  that 

f^^^ {x)  =  n{n-l)---{n-k  +  1)  (1  +  x^'^. 

Thus  c(n,  k),  the  coefficient  of  x''  in  (1  +  a;)",  is 

/W(0)        n(n-l)---(n- fc+l) 


C(n,  k) 


kl  kl 


We  conclude  this  example  with  a  useful  formula  that  follows  from  (1.3).  Since  {x  +  j/)"  = 
x"'{l  +  (y/x))",  it  follows  that  the  coefficient  of  x'^{y/x)'^  in  [x  +  y)"  is  C(n,  k).  This  gives  us  the 

Theorem  1.7   Binomial  Theorem 

{x  +  yT  =  Ef"V""'^'- 


fe=0 


The  expressions  we've  been  studying  are  called  generating  functions.  □ 
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Example  1.15  Card  hands:  Full  house  Card  hands  provide  a  source  of  some  simple  sounding 
but  tricky  set  counting  problems.  A  standard  deck  of  cards  contains  52  cards,  each  of  which  is  marked 
with  two  labels.  The  first  label,  called  the  "suit,"  belongs  to  the  set 

{*,^,0,*}. 

The  second  label,  called  the  "value"  belongs  to  the  set 

{2, 3, 4, 5, 6, 7, 8, 9, 10,  J,  Q,K,  A}. 

Each  pair  of  labels  occurs  exactly  once  in  the  deck.  A  hand  is  a  subset  of  a  deck.  Two  cards  are  a 
pair  if  they  have  the  same  values. 

How  many  5  card  hands  consist  of  a  pair  and  a  triple?  (In  poker,  such  a  hand  is  called  a  full 
house.) 

To  calculate  this  we  describe  how  to  construct  such  a  hand: 

•  Choose  the  value  for  the  pair  AND 

•  Choose  the  value  for  the  triple  different  from  the  pair  AND 

•  Choose  the  2  suits  for  the  pair  AND 

•  Choose  the  3  suits  for  the  triple. 

This  produces  each  full  house  exactly  once,  so  the  number  is  the  product  of  the  answers  for  the  four 
steps,  namely 

13  X  12  X  C(4,2)  X  C(4,3)  =  3,744. 

What  is  the  probability  of  being  dealt  a  full  house?  There  are  (^^)  distinct  hands  of  cards  so  we 
could  simply  divide  the  previous  answer  by  this  number.  This  approach  looks  at  the  result  of  the 
deal  rather  than  the  actual  deal.  Why  do  we  say  that?  When  a  hand  of  cards  is  dealt,  the  order  in 
which  you  receive  the  cards  matters.  Thus: 

•  If  we  look  at  the  resulting  hand,  then  the  order  of  the  cards  doesn't  matter.  That's  the  way  we 
just  got  the  answer. 

•  If  we  look  at  the  dealing  process,  then  the  order  of  the  cards  matters.  We'll  do  the  problem  that 

way  next. 

Each  of  the  52  x  51  x  50  x  49  x  48  ways  of  dealing  five  cards  from  52  as  equally  likely.  Then  we 
should  divide  this  into  the  number  of  ways  of  being  dealt  a  full  house.  Since  all  the  cards  in  a  hand 
of  five  cards  arc  different,  they  can  be  ordered  in  5!  ways.  Hence  the  probability  of  being  dealt  a  full 
house  is  ro^gwll^'^.n.. -iq ■  which  gives  the  same  answer  as  before,  3,744/(''l^) . 

52x51x50x49x48'  ^  '      "        '  V  / 

Let's  phrase  these  in  terms  of  probability  spaces.  We'll  use  the  uniform  distribution  on  both 
spaces. 

•  Resulting  hand:    The  space  contains  all  (''^^)  5-card  subsets  of  the  52-card  deck. 

•  Dealing  process:    The  space  contains  all  52  x  51  x  50  x  49  x  48  5-card  lists  without  repeats  that 
can  be  made  from  a  52-card  deck. 

Which  approach  should  you  use?  That's  up  to  you.  However,  whichever  approach  you  choose,  there 
you  may  have  problems  if  there  is  more  than  one  copy  of  a  card.  For  example,  one  might  add  two 
jokers  to  a  deck  or  one  might  combine  two  identical  decks  as  in  canasta.  In  this  case,  it's  probably 
easiest  and  safest  to  pretend  that  the  cards  have  been  marked  so  you  can  tell  them  apart;  for  example, 
call  them  joker-1  and  joker-2.  D 
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Example  1.16  Card  hands:  Two  pairs  We'll  continue  with  our  poker  hands.  How  many  5  card 
hands  consist  of  two  pairs?  A  description  of  a  hand  always  means  that  there  is  nothing  better  in  the 
hand,  so  "two  pairs"  means  we  don't  have  a  full  house  or  four  of  a  kind. 

One  thing  we  might  try  is  to  go  back  to  the  preceding  example's  description  of  how  to  construct 
a  full  house  and  two  simple  changes:  (a)  replace  "triple"  by  "second  pair"  and  (b)  add  a  choice 
for  the  card  that  belongs  to  no  pair.  This  is  wrong!  Each  hand  is  constructed  twice,  depending  on 
which  pair  is  the  "second  pair."  Try  it!  What  happened?  Before  choosing  the  cards  for  a  pair  and 
a  triple,  we  can  distinguish  the  pair  from  the  triple  because  one  contains  two  cards  and  the  other 
contains  three.  We  can't  distinguish  the  two  pairs,  though,  until  the  values  are  specified.  This  is  an 
example  of  a  situation  where  we  can  easily  make  mistakes  if  we  forget  that  "AND"  means  "AND 
then."  Here's  a  correct  description,  with  "then"  put  in  for  emphasis. 

•  Choose  the  values  for  the  two  pairs  AND  then 

•  Choose  the  2  suits  for  the  pair  with  the  larger  value  AND  then 

•  Choose  the  2  suits  for  the  pair  with  the  smaller  value  AND  then 

•  Choose  the  remaining  card  from  the  4x11  cards  that  have  different  values  than  the  pairs. 
The  value  is 


=  123,552. 


You  may  find  what  we've  just  been  through  disquieting:  How  can  you  decide  between  distin- 
guishable and  indistinguishable?  The  answer  is  simple:  Draw  a  picture  of  the  cards  and  fill  in  the 
information  after  each  step.  Let's  do  this  for  the  full  house  and  the  two  pair  problems.  To  begin 
with,  we  have  five  blank  cards.  For  the  full  house,  we  divide  the  cards  up  into  a  pair  and  a  triple: 


We  can  tell  the  two  groups  apart,  so  it  makes  sense  to  talk  about  assigning  a  value  to  the  pair,  say  9, 
and  to  the  triple,  say  7,  to  obtain 


We  can't  tell  the  two  nines  apart,  so  all  we  can  do  is  choose  a  subset  of  two  suits  to  assign  to  them; 
likewise  for  the  triple.  We  might  choose  and  and  obtain 


Now  look  at  the  case  of  two  pairs.  We  have 
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Since  we  can't  distinguish  the  two  pairs,  all  we  can  do  is  choose  a  set  of  two  values,  say  {7,  9}  and 
put  them  on  the  cards: 


Now  we  can  distinguish  between  the  pairs.  For  the  pair  of  sevens  we  might  choose  the  set  Jit}  of 
suits,  and  for  the  nines,  {"s?,  4k}.  As  a  result,  we  have 


.  □ 


Example  1.17   Smorgasbord  College  programs  Smorgasbord  College  allows  students  to 

study  in  three  principal  areas:  (a)  Swiss  naval  history,  (b)  elementary  theory  and  (c)  computer 
science.  The  number  of  upper  division  courses  offered  in  these  fields  are  2,  92,  and  15  respectively. 
To  graduate  a  student  must  choose  a  major  and  take  6  upper  division  courses  in  it,  and  also  choose 
a  minor  and  take  2  upper  division  courses  in  it.  Swiss  naval  history  cannot  be  a  major  because  only 
2  upper  division  courses  arc  offered  in  it. 
How  many  programs  arc  possible? 

The  possible  major-minor  pairs  are  b-a,  b-c,  c-a,  and  c-b.  By  the  Rule  of  Sum  wc  can  simply 
add  up  the  number  of  programs  in  each  combination.  Those  programs  can  be  found  by  the  Rule  of 
Product.  The  number  of  major  programs  in  (b)  is  C(92,  6)  and  in  (c)  is  C(15, 6).  For  minor  programs: 
(a)  is  C(2,2)  =  1,  (b)  is  C(92,2)  =  4186  and  (c)  is  C(15,2)  =  105.  Since  the  possible  programs  are 
constructed  by 


major  (b)  AND  ^minor  (a)  OR  minor  (c)^ 
OR  (  major  (c)  AND  ^minor  (a)  OR  minor  (b)^ 


the  number  of  possible  programs  is 


6  7'  '     V  6 

a  rather  large  number.  □ 


92\  /15^ 

(1  +  105)+    ^  1(1  +  4186)  =  75,606,201,671, 


Example  1.18    Multinomial  coefficients     Suppose  we  are  given  fc  boxes  labeled  1  through  fc 

and  a  set  S  and  arc  told  to  distribute  the  elements  of  S  among  the  boxes  so  that  the  iih  box  contains 
exactly  rrij  elements.  How  many  ways  can  this  be  done? 

Let  n=  \S\.  Unless  mi  +  •  •  •  +  rrik  =  n,  the  answer  is  zero  because  we  don't  have  the  right 
number  of  objects.  Therefore,  we  assume  from  now  on  that 

mi  H  \-mk  =  n. 

Here's  a  way  to  describe  filling  the  boxes. 

•  Fill  the  first  box  (There  are  C(n,  mi)  ways.)  AND 

•  Fill  the  second  box  (There  are  C(n  —  mi,m2)  ways.)  AND 


Fill  the  kth  box.  (There  are  C{n  —  (mi  +  . . .  +  mfe_i),  mfe)  =  C{mk,  mk)  =  1  ways.) 
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Now  apply  the  Rule  of  Product,  use  the  formula  C{p,q)  =  p^- / q^-{p  —  q)\  everywhere,  and  cancel  com- 
mon factors  in  numerator  and  denominator  to  obtain  n\/m\\m2\  ■  ■  ■  m/;!.  This  is  called  a  multinomial 
coefficient  and  is  written 

1.4 


TOi,  m2, . . .  jTOfe/        mi\m2\  ■  ■  ■  rukV 

where  n  —  mi  +  +  . . .  +  ruk-  In  multinomial  notation,  the  binomial  coefficient  (^)  would  be 
written     (^n-k))'  think  of  the  first  box  as  the  k  things  that  are  chosen  and  the  second  box 

as  the  n  —  k  things  that  are  not  chosen. 

Before  you  read  on,  try  to  think  of  an  ordered  list  interpretation  for  the  multinomial  coefficient. 

*       *       *       Stop  and  think  about  this!        *       *  * 

Think  of  the  objects  being  distributed  as  positions  in  a  word  and  the  boxes  as  letters.  If  the  object 

"position  3"  is  placed  in  the  box  "D,"  then  the  letter  D  is  the  third  letter  in  the  word.  The  multinomial 
coefficient  is  then  the  number  of  words  that  can  be  made  so  that  letter  i  appears  exactly  Wj  times. 
A  word  can  be  thought  of  as  an  ordered  list  of  its  letters.  □ 

Example  1.19  Words  from  a  collection  of  letters  Using  the  idea  at  the  end  of  the  previous 
example,  we  can  more  easily  count  the  words  that  can  be  made  from  ERROR,  a  problem  discussed 
in  Example  1.11  (p.  13).  Suppose  we  want  to  make  words  of  length  k.  Let  mi  be  the  number  of  E's, 
m2  the  number  of  O's  and  the  number  of  R's.  By  considering  all  possible  cases  for  the  number  of 
each  letter,  you  should  be  able  to  see  that  the  answer  is  the  sum  of  m^m)  '^^^^  Tni,m2,m^ 
such  that 

mi  +  m2  +  ma  =  k,    0  <  mi  <  1,    0  <  m2  <  1,    0  <  ma  <  3. 


Thus  we  obtain 


k=l: 


o,o,i)  +  (o,i,o)  +  (l,0,0 


0,0,27    \o,i,iJ    V,0AJ  Vi>i.o. 

3    \      /    3\      /    3\      /  3 


0,0,3/  '  \0,1,2J  '  \1,0,2J  '  VI, 1,1'  " 

\l,0,5j  \l,l,2j 

k  =  5:    f  ^  f  ^  )   =  20. 
VI,  1,3/ 

This  is  better  than  in  Example  1.11.  Instead  of  having  to  list  words,  we  have  to  list  triples  of 
numbers  and  each  triple  generally  corresponds  to  more  than  one  word.  Here's  the  lists  for  the 
preceding  computations 

k=l:    0,0,1  0,1,0  1,0,0 

k  =  2:    0,0,2  0,1,1  1,0,1  1,1,0 

k  =  3:    0,0,3  0,1,2  1,0,2  1,1,1 

k  =  4::    0,1,3  1,0,3  1,1,2 

k  =  5:  1,1,3 

In  Example  3.3  (p.  69),  we  will  see  how  to  do  this  more  systematically  and  efficiently.  □ 
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Example  1.20   Card  hands  and  multinomial  coefficients  We'll  redo  Examples  1.15  and  1.16, 

and  then  discuss  the  general  situation  using  multinomial  coefficients. 

To  form  a  full  house,  wc  must  choose  a  face  value  for  the  triple,  choose  a  face  value  for  the  pair, 
and  leave  eleven  face  values  unused.  This  can  be  done  in  (^^  J'^j^j^)  ways.  We  then  choose  the  suits  for 

the  triple  in  (g)  ways  and  the  suits  for  the  pair  in  (2)  ways. 

To  form  two  pair,  we  must  choose  two  face  values  for  the  pairs,  choose  a  face  value  for  the  single 
card,  and  leave  ten  face  values  unused.  This  can  be  done  in  (2  W^)  ways.  We  then  choose  suits  for 

each  of  the  face  values  in  turn,  so  we  miist  multiply  by  (2)  (2)  (1)  • 

Imagine  an  eleven  card  hand  containing  two  triples,  a  pair  and  three  single  cards.  You  should 
be  able  to  see  that  the  number  of  ways  to  do  this  is 

Let's  do  the  general  case.  Suppose  our  hand  must  contain  ci  singles,  C2  pairs,  C3  triples  and  C4 
four-of-a-kinds.  The  number  of  such  hands  is 

13        \  My  My  My 

where  fc  =  13  —  ci  —  C2  —  C3  —  C4  is  the  number  of  face  values  not  in  the  hand.  □ 

Example  1.21    Choosing  Teams     Given  22  people,  how  many  ways  can  we  divide  them  into  4 
teams  of  5  players  each  plus  2  referees?  If  the  teams  and  referees  were  labeled,  the  answer  would  be 
Given  4  different  teams  and  two  referees,  there  are  4!  ways  to  label  the  teams  as  Team  1, 
2,  3,  and  4,  and  there  are  2  ways  to  label  the  referees,  so  the  answer  is 

22       \     1  (22)! 


5,5,5,5,1,1/ 4!  X  2  2(5!)44! 

Suppose  now  we  must  divide  up  the  teams  into  pairs  that  compete  against  each  other,  and 
we  assign  a  referee  to  each  pair.  If  they  were  called  Match  #1  and  Match  #2,  we  could  fill  out 
Match  #1  by  choosing  2  of  the  4  teams  and  1  of  the  referees.  Those  left  are  Match  #2.  This  gives 
us  (2)  X  2  =  12.  Thus  we  have  2 (5^1) 4 '41  x  12.  Of  course,  there  isn't  really  a  Match  #1  and  Match  #2, 
but  there  are  two  ways  to  assign  match  labels  and  so  we  must  divide  the  answer  we  just  got  by  2.  □ 

*Example  1.22  Incomparable  sets  Let  A  be  an  n-set.  By  a  Spemer  family  on  A  we  mean 
a  family  of  subsets  of  A  such  that  no  subset  in  the  family  is  contained  in  any  other  subset  in  the 
family.  For  example,  letA  =  {l,2,3,4,5}.  Then 

{1,2,4}       {1,5}       {2,4,5}  {3,5} 

is  a  Sperner  family  but 

{1,2,4}       {1,5}       {2,4}  {3,5} 

is  not. 

What  is  the  largest  number  of  subsets  that  we  can  have  in  a  Sperner  family  of  an  n-set? 

Clearly  the  family  of  all  fc-subsets  of  A  is  a  Sperner  family.  Thus  we  can  construct  Sperner 
families  of  size  at  least  (^) .  What  value  of  k  will  make  this  as  large  as  possible?  One  way  to  find  the 
value  of  k  is  to  look  at  the  ratio  of  (^)  to  {^."^i)-  When  this  ratio  exceeds  1,  the  sequence  of  binomial 
coefficients  is  increasing  and  when  it  is  less  than  1  the  sequence  is  decreasing.  Since 

(fc)     _  n\  (n- k  +  iy.  (k-iy.  _  n-k+1  _n  +  l  _ 
~         {n-k)\  k\  n\         ~        k       ~  ^fc  ' 
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we  see  that  the  sequence  is  increasing  when  (n  +  l)/fc  >  2  and  is  decreasing  when  (n  +l)/k  <  2.  It 
follows  that 

.  .  (  k  =  n/2,  when  n  is  even; 

'  IS  a  maximum  at 


kj  |^A;  =  (n  —  l)/2  and  A;  =  (n  +  l)/2,  when  n  is  odd. 

[xj ,  the  floor  function,  denotes  the  largest  integer  not  exceeding  x.  With  this,  wc  can  write  our 
conclusions  in  the  form:  There  is  a  Sperner  family  of  size  (|^„"2j)'  '^l^ich  can  be  obtained  by  taking 
all  [n/2 J -subsets  of  A. 

Sperner  proved  that  this  result  is  best  possible:  there  are  no  larger  Sperner  famihes.  We  now 
present  an  adaptation  of  Lubell's  proof  of  this  result. 

Call  a  A:-set  B  an  "initial  part"  of  a  list  L  if  the  first  k  elements  of  L  are  the  elements  of  B.  Let 
5  be  a  Sperner  family  on  the  n-set  A.  Consider  an  n-list  L  of  the  n  elements  of  A.  We  claim  that  at 
most  one  set  in  S  can  be  an  initial  part  of  L,  for  if  there  were  two  such  sets,  one  would  correspond 
to  a  longer  initial  part  than  the  other  and  so  contain  the  other  as  a  subset. 

On  the  other  hand,  a  fc-set  B  is  the  initial  part  of  exactly  fc!  (n  —  /c)!  n-hsts.  Why  is  this?  The 
first  k  elements  of  the  list  must  be  some  arrangement  of  the  elements  of  B,  AND  the  remaining  n  —  k 
elements  of  the  list  must  be  some  arrangement  of  the  remaining  n  —  k  elements  of  S.  Furthermore, 
any  list  satisfying  these  conditions  has  B  as  an  initial  part.  By  the  Rule  of  Product  and  Theorem  1.4 
(p.  11),  there  are  k\  {n  —  k)\  permutations  which  have  B  as  an  initial  part.  Adding  this  up  over  all  B 
in  S,  wc  obtain  the  number  of  rearrangements  of  A  that  have  sets  in  S  as  initial  parts.  (This  uses 
the  result  from  previous  paragraph  that  each  list  has  at  most  one  element  of  iS  as  an  initial  part.) 
Since  there  are  n\  lists,  we  have  proved 

J2\Bnn-\B\y.  <  n!. 
Bes 


Dividing  by  n!  we  obtain 


E 


Bes  y\B\ 


<  1.  1.5 


This  inequality  is  the  key  to  the  proof.  By  our  earlier  work  on  the  size  of  binomial  coefiicients,  we 
know  that  each  term  in  the  sum  in  (1.5)  is  at  least  as  big  as  l/(|^„"2j)-  Consequently,  the  sum  in  (1.5) 
can  have  at  most  (|^„"2j)  terms.  In  other  words  the  size  of  the  Sperner  family  is  at  most  {^n'/2]) — 
we  have  already  constructed  Sperner  families  this  big!  This  completes  the  proof.  □ 

*Example  1.23  When  are  two  subsets  disjoint?  Alice  chooses  a  fc-subset  at  random  from 
and  n-set.  Bob  chooses  an  Z-subset  at  random  from  the  same  n-set.  Find  an  exact  expression  and  a 
simple  estimate  the  probability  that  the  two  subsets  are  disjoint.  For  the  estimate,  you  may  assume 
that  k  =  o(n2/3)  and  /  =  o{n'^/^). 

Call  the  probability  P{n,k,l).  By  the  Rule  of  Product,  there  are  (^)(")  ways  to  choose  two 
subsets  of  the  given  sizes.  There  are  (^)  ("7'°)  ways  to  choose  two  disjoint  subsets  of  the  given  sizes. 
Since  things  are  done  at  random,  all  choices  are  equally  likely  and  so 


P{n,k,l) 


G)  ("7  )        {n  -  k)\/{n  -k-iy.        (n  -  fc)!         (n  -  l)\ 


n\/{n-l)\  n!  {{n-l)-ky: 


This  is  the  exact  answer  written  in  various  forms. 

The  exact  answer  does  not  give  us  a  good  idea  of  how  the  probability  behaves  when  the  numbers 
are  large.  To  get  a  simple  estimate,  we  use  (1.2): 

-  n'e-''/'-       and  ^"T/^'    ,  ~  (n  -  O'=e-'=V2(-0. 


(n-fc)!  ((n-O-fc)! 
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Thus 

^  '  '  ^  ^fce-'^VSn  \^      ny      ^  V   2n(n  -I)  J 

We  need  to  look  at  the  two  factors  on  the  right.  Inside  the  exponential  we  have  2n{n-i)  •  ^i^^ce  /  is  small 
compared  to  n,  this  is  nearly  Since  k  =  o(n^/^)  and  I  =  o(n^/^),  we  have  k'^l  =  o((n^/^)^n^/^). 
Combining  the  exponents  on  the  right,  fc^Z  =  o(n^).  Thus  2n{n-l)  ~*  0  =  ''(l)-  Since  the  exponential 
of  a  number  close  to  zero  is  close  to  one,  (1.5)  becomes 

P{n,k,l)  ~  =  exp  (^fcln  (^1- ^ 

~  exp{k{{-l/n)  -  +  0(/Vn3)))  by  (1.1) 

=  exp{-kl/n-{kl^/2n^)  +  0{kt^/n^)). 

You  should  be  able  show  that  kl'^ /2v?  =  o(l)  and  kl^/n^  =  o(n~^/^).  Thus  we  have 

P(n,  /c,  0  ~  e"*^'/"    provided  fc  =  o(n2/3)  and  I  =  o{n^/^). 

Our  constraints  on  the  growth  of  k  and  I  was  necessary  so  that  we  could  obtain  our  result,  but 
looking  at  the  result  we  can  see  some  justification  for  the  constraints:  When  kl  is  much  larger  than 
n,  the  probability  will  be  very  close  to  0  and  so  may  be  uninteresting.  If  k  and  I  are  about  the  same 
size,  the  low  probability  occurs  when  they  grow  faster  than  n^/^.  D 


*Error  Correcting  Codes 


We  want  to  represent  information  by  n-strings  of  zeroes  and  ones.  For  example,  the  information  may 
be  a  letter  of  the  alphabet.  ASCII  provides  a  way  of  doing  this:  an  8-string  is  used  to  represent  the 
upper  and  lower  case  alphabet,  the  digits,  the  punctuation  marks  and  some  special  "characters." 

The  ASCII  representation  of  characters  is  quite  sensitive  to  errors:  if  even  a  single  entry  in  the 
8-string  is  changed,  we  end  up  with  a  completely  different  character.  This  may  be  unacceptable. 
For  example,  suppose  the  characters  are  being  transmitted  over  a  data  link  which  may  have  a  small 
amount  of  static,  the  effect  of  which  is  to  sometimes  change  a  zero  to  a  one  or  vice  versa.  A  Soviet 
space  probe  was  lost  in  1989  because  of  a  single  character  error  in  a  lengthy  control  signal. 

What  can  we  do  about  the  problem  of  errors  in  transmission? 

One  solution  is  to  transmit  the  ASCII  representation  of  each  character  some  number  fc  >  1 
times.  If  k  =  2  and  the  two  transmitted  values  agree,  we  very  likely  have  the  correct  value.  If  they 
disagree,  we  must  ask  the  sender  to  try  again.  If  fc  =  3  and  the  three  transmitted  values  disagree, 
instead  of  asking  for  a  retransmission,  we  can  try  to  guess  the  answer  by  using  a  majority  vote.  For 
example,  suppose  we  transmit  three  copies  of  01010101,  which  we  receive  as  01010001,  01110100 
and  11010101.  A  majority  vote  on  each  digit  gives  us  the  correct  answer.  This  is  known  as  an  error 
correcting  code.  Of  course,  this  method  can  fail.  If  we  had  received  01010001,  01110100  and  11010100 
we  would  get  the  eighth  digit  wrong.  We  can  increase  our  chances  of  getting  the  correct  answer  by 
increasing  fc,  the  number  of  repetitions. 

There  are  better  error  correcting  codes — ^they  allow  us  to  send  shorter  strings  and  still  be  at 
least  as  likely  to  be  able  to  correct  errors. 

The  basic  idea  is  that  we  want  to  represent  each  of  our  characters  by  an  ri-string  of  zeroes  and 
ones  in  such  a  way  that  if  aia2  • . .  represents  one  character  and  6162  . . .  6„  represents  another, 
then  we  often  have  ^  6j.  Why  is  this  good?  It  will  help  our  discussion  if  we  have  some  notation. 
Let  A  be  the  set  of  characters  we  are  interested  in  and  let  /  be  a  function  that  assigns  to  each  a  G  A 
the  n-string  that  will  be  used  to  represent  a;  i.e.,  /(a)  e  {0, 1}". 
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For  s,t  £  {0, 1}",  let  d{s,t)  be  the  number  of  positions  in  which  s  differs  from  t.  For  example, 
if  a  =  001001  and  b  =  000101,  then  d{a,b)  =  2.  Finally,  let  d{f)  be  the  minimum  of  d{f{x),f{y)) 
over  all  X  ^  y  in  A.  Whenever  r  and  t  differ  in  a  position,  either  r  and  s  differ  in  that  position  or  s 
and  t  differ  in  that  position.  Thus 

d{r,t)  <  d{r,s)  +d{s,t).  1.6 

We  cannot  replace  the  inequality  with  an  equality,  because  r  and  t  may  agree  in  a  position  but  both 
may  differ  from  s  in  that  position. 

Suppose  that  d{f)  =  2,  that  we  transmit  f{x)  and  that  a  single  zero-one  bit  is  changed  by  static 
so  that  we  receive  s.  We  claim  that  we  can  tell  an  error  has  been  made.  If  we  can't  tell,  it  must  be 
because  f{y)  =  s  for  some  y  &  A.  This  is  impossible  because  it  would  imply  that  d{x,  y)  =  1  and  so 
d{f)  <  1. 

We  can  do  more.  Suppose  that  d{  f)  =  3,  that  we  transmit  f{x)  and  that  a  single  zero-one  bit  is 
changed  by  static  so  that  we  receive  s.  We  claim  that  x  is  the  only  y  £  A  such  that  d{f{y),s)  <  2. 
In  other  words,  x  is  the  only  character  whose  "encoding"  is  less  than  two  errors  away  from  s.  Why 
is  this?  By  (1.6)  and  the  definition  of  d{f),  if  y  e  A  and  y  ^  x,  then 

3  =  d{f)  <  d{f{x),f{y))  <  d{f{x),s)  +  d{s,f{y))  =  l  +  rf(s,/(y)). 

Thus  d{s,  f{y))  >  2. 

More  generally,  if  d{f)  >  2fc  -|-  1  and  s  G  {0, 1}",  there  is  at  most  one  x  G  A  with  d{f{x),  s)  <  k. 
Thus,  if  we  assume  that  at  most  k  errors  have  been  made,  we  can  recover  the  value  of  x.  Given  that 
,s  is  received,  one  wants  to  find  an  x  G  A  so  that  d{f{x),  s)  is  a  minimum.  This  is  called  "decoding." 
Decoding  efficiently  is  a  difficult  problem  that  we  will  not  study. 

Suppose  we  want  d{f)  >  2fc  +  1,  how  large  must  n  be?  First  we  study  lower  bounds  on  n  and 
then  we  study  upper  bounds. 


Example  1.24    A  lower  bound  on  codeword  length    Here's  the  idea  for  finding  a  lower  bound. 

Let  N{x)  be  the  set  of  all  H-strings  .s  such  that  d{f{x),  s)  <  k,  where  d  and  /  are  as  defined  in  the 
preceding  paragraphs.  Later  we  will  prove  that  |./V(a:)|  does  not  depend  on  x.  Let  A'"  =  |iV(a;)|. 
Suppose  that  x  ^  y  €  A.  We  will  prove  that  N{x)  fl  N{y)  =  0,  the  empty  set.  The  number  of 
n-strings  must  therefore  be  at  least  N  \A\;  however,  there  are  2"  n-strings  and  so  A''  \A\  <  2" . 

We  now  prove  that  N{x)  n  N{y)  =  0  using  proof  by  contradiction.  Suppose  s  E  N{x)  n  N{y). 
Then  d{f{x),s)  <  k  and  d{s,f{y))  <  k.  By  (1.6),  d{f{x),f{y))  <  2k,  contradicting  d{f)  >2k  +  l. 

We  now  compute  |A''(a;)|  by  noting  that  s  G  N{x)  if  and  only  if  it  differs  from  f{x)  in  exactly  j 
positions  for  some  j  <  k.  There  are  (")  ways  to  select  the  j  positions  that  must  be  changed.  Thus 


i-wi^gC"). 


j=0 

Incidentally,  this  proves  that  |A''(a;)|  does  not  depend  on  x. 

Dividing  our  inequality  N \A\  <  2"  by  A/'  and  substituting  our  formula  for  N  =  |A''(a;)|,  we 
obtain 

on 

\A\  <  —r  .  1.7 

The  smallest  n  for  which  (1.7)  is  true  is  a  lower  bound  on  how  long  the  strings  must  be.  Here  are 
the  lower  bounds  that  are  obtained  for  fc  =  1  and  2. 


1^1 

2  3  4  5  10  20 

fc  =  1 

k  =  2 

3  4  5  5  7  9 
5  7  7  8    9  11 
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For  example,  if  we  have  20  characters  and  want  to  be  able  to  correct  strings  that  contain  at  most 
2  errors,  then  the  string  length  will  have  to  be  at  least  11  (and  possibly  larger  since  this  is  only  a 
lower  bound). 

The  bound  we  obtained  is  called  the  "sphere  packing  bound"  because  N{x)  is  thought  of  as  a 
type  of  sphere  with  center  x  and  radius  k.  □ 

We've  shown  that,  if  there  is  a  code  for  A  that  corrects  up  to  k  errors,  then  the  length  n  of  the 
codewords  must  be  so  large  that  (1.7)  holds.  Now  we  want  a  result  in  the  other  direction;  that  is, 
we  want  an  inequality  such  that,  if  n  satisfies  it,  then  there  must  be  a  code  for  A  that  corrects  up 
to  k  errors.  In  other  words,  we  want  to  find  an  upper  bound  on  how  large  n  must  be.  There  are  at 
least  two  ways  to  obtain  such  a  result.  One  is  to  actually  construct  a  code.  Another  is  to  show  that 
among  all  possible  codes  for  A  having  words  of  length  n,  at  least  one  must  be  able  to  correct  k  and 
fewer  errors.  We'll  take  the  second  approach  and  use  a  probabilistic  argument. 

Example  1.25  An  upper  bound  on  codeword  length  We  begin  by  constructing  a  proba- 
bility space  (S',  Pr).  Let  S  be  all  possible  subsets  of  size  \A\  of  {0, 1}".  In  other  words,  S  consists  of 
all  (|^|)  possible  sets  of  codewords.  To  make  a  subset  into  a  code,  we  simply  associate  an  element  of 
the  alphabet  A  with  each  element  of  the  subset.  Let  Pr  be  the  uniform  distribution  on  S.  Thus  the 
elementary  events  arc  subsets  which  arc  potential  codes;  that  is  the  |A|-subscts  of  {0, 1}".  A  subset 
C  &  S  will  be  good  if  every  pair  of  its  n-strings  are  at  least  distance  2fc  +  1  apart.  (We've  use  C  to 
remind  us  that  the  subset  is  a  potential  code.)  Then  assigning  letters  to  n-strings  in  C  will  give  us 
a  code  for  A  that  corrects  k  and  fewer  errors.  We  want  to  find  an  upper  bound  on  n.  This  will  be 
an  inequality  on  n  such  that  S  will  contain  at  least  one  good  subset  if  n  satisfies  the  inequality. 

Here  is  a  method  that  is  often  used  to  obtain  such  inequalities.  Let  the  random  variable  X  be 
the  number  of  pairs  of  bad  n-strings  in  an  randomly  chosen  C.  Thus  C  is  good  whenever  X{C)  =  0. 
Since  X  must  be  a  nonnegative  integer,  the  expectation  of  X  is 

oo 

E(X)  =  ^A;Pr(X  =  fc)  >  Pr(X>0). 

k=0 

If  we  can  prove  that  E(X)  <  1,  we  will  have  Pr(X>0)  <  1  and  so  Pr(X  =  0)  >  0.  Since  X{C)  =  0 
means  C  is  good,  there  must  be  a  good  C.  We'll  evaluate  E(X)  in  a  minute,  but  first,  what  is  the 
general  method?  Here  it  is. 

•  Introduce  a  probability  space  (5, Pr). 

•  Introduce  a  random  variable  X  such  that 

•  the  values  of  X  are  nonnegative  integers, 

•  if  X{s)  =  0.  then  ,s  G  S*  has  the  property  we  want. 

•  Find  conditions  such  that  E(X)  <  1. 

We'll  now  do  the  last  step,  showing  that  E(X)  <  1. 

Since  our  probability  is  uniform  and  since  each  C  G  S  contains  ('2')  pairs  of  n-strings, 

the  expected  value  of  X, 
which  is 

the  number  of  pairs  of  n-strings  in  a  random  C  which  are  too  close 

equals 

('2')  times  the  probability  that  two  random  n-strings  are  too  close 

which  equals 

('^')  times  the  probability  that  a  new  random  n-strings  is  too  close  to  a  given  one. 

As  in  the  preceding  example,  we  want  the  number  of  n-strings  within  distance  2k  of  a  given  string, 
not  counting  the  given  string.  You  should  be  able  to  show  that  this  equals  Y^'^ti  (")•  Since  there  are 
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2"  —  1  strings  other  than  the  given  one,  we  have 

We  wanted  this  to  be  less  than  one.  In  other  words,  given  \A\  and  k,  there  will  be  a  /c-error  correcting 
code  with  \A\  codewords  if  n  is  so  large  that 

Here  is  a  table  of  the  smallest  values  of  n  that  satisfy  the  inequality  for  some  values  of  \A\  and  two 
values  of  k. 


2    3    4    5  10  20 

k  =  l 
fc  =  2 

3  7  8  9  12  15 
5  11  13  14  18  21 

As  you  can  see,  these  upper  bounds  are  quite  a  bit  larger  than  the  lower  bounds  in  the  preceding 
example.  D 

One  approach  to  creating  a  code  would  be  to  choose  n  so  that  the  right  side  of  (1.8)  is  not  too 
close  to  1.  For  example,  say  it  equals  0.7.  Since  this  number  is  E(X)  which  we  saw  was  an  upper 
bound  on  the  probability  that  a  random  code  is  bad,  there  is  a  30%  probability  that  a  randomly 
chosen  element  of  S  will  be  good.  After  a  few  random  tries,  we  should  be  able  to  find  a  code.  This 
seems  to  be  an  easy  way  to  construct  error  correcting  codes.  It  is — but  they're  no  good!  Why  is 
this?  With  a  random  set  of  codewords,  there  is  no  problem  encoding  our  message  for  transmission; 
however,  if  |^|  is  large,  it  will  be  quite  difficult  to  decode.  To  make  decoding  easy,  one  needs  to 
construct  a  code  that  has  some  nice  structure  that  one  can  use.  This  need  has  led  to  a  considerable 
amount  of  research  and  to  texts  on  the  subject. 


Exercises 


1.3.1.  Suppose  that  k  and  n  —  k  both  get  large  as  n  gets  large.  Use  Stirling's  formula  to  show  that 

I  ?  I      —    I  — r- — — r-  I       where  A  =  k/n. 

1.3.2.  Suppose  we  have  an  election  between  two  candidates  and  the  ballots  are  counted  one-by-one.  At 

the  end,  the  candidates  are  tied  with  n  votes  each.  If  the  order  of  the  votes  is  random,  what  is  the 
probability  that  one  of  the  candidates  was  never  behind  in  the  counting? 
Hint.  See  Example  1.13. 

1.3.3.  How  many  6  card  hands  contain  3  pairs? 

1.3.4.  How  many  ways  can  a  5  card  hand  containing  2  pairs  be  dealt?  In  other  words,  the  order  in  which 
a  person  gets  her  cards  matters. 

1.3.5.  How  many  5  card  hands  contain  a  straight?  A  straight  is  5  consecutive  cards  from  the  sequence 
A,2,3,4,5,6,7,8,9,10,J,Q,K,A  without  regard  to  suit. 

1.3.6.  How  many  compositions  of  n  are  there  that  have  exactly  k  parts?  The  composition  1,2,2  of  5  has 
3  parts. 

Hint.  See  Exercise  1.1.4. 
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1.3.7.  How  many  rearrangements  of  the  letters  in  EXERCISES  are  there?  How  many  arrangements  of  eight 
letters  can  be  formed  using  the  letters  in  EXERCISES?  (No  letter  may  be  used  more  frequently  than 
it  appears  in  EXERCISES.) 

1.3.8.  In  some  card  games  only  the  values  of  the  cards  matter  and  their  suits  are  irrelevant.  Thus  there  are 
effectively  only  13  distinct  cards.  How  many  different  ways  can  a  deck  of  cards  be  arranged  in  this 
case?  The  answer  is  a  multinomial  coefficient. 

1.3.9.  Return  to  choosing  teams  (Example  1.21).  Suppose  half  the  people  are  women  and  half  are  men, 
that  each  team  must  be  as  nearly  evenly  split  as  possible,  and  that  there  is  one  referee  of  each  sex. 
How  many  ways  can  this  be  done? 

1.3.10.  There  is  an  empire  in  the  far  away  galaxy  we've  been  visiting.  They  use  the  same  alphabet  (A,I,L,S,T) 
but  their  names  consist  of  seven  letters.  Each  name  begins  and  ends  with  a  consonant,  contains  no 
adjacent  vowels  and  never  contains  three  adjacent  consonants.  As  before,  if  two  consonants  are 
adjacent,  they  cannot  be  the  same. 

(a)  List  the  first  4  names  in  dictionary  order. 

(b)  List  the  last  4  names  in  dictionary  order. 

(c)  What  are  the  first  4  names  in  dictionary  order  with  just  2  vowels? 

(d)  How  many  names  are  possible? 

*1.3.11.  {Multinomial  Theorem)     Prove  that  the  coefficient  of  y'^^y^^  ■  ■  ■  2/™*=  in  (j/i  +?/2  H  \-ykT  is  the 

multinomial  coefficient  n!/mi!m2!  •  •  •  m^!  when  n  =  mi  +  •  •  •  +  nik  and  zero  otherwise. 
Hint.  Write 

(2/1+2/2  +  \-ykT  =  ((2/1  +  J/2  H  h  J/fc-i)  +  2/fc)"- 

Now  use  the  Binomial  Theorem  (Theorem  1.7)  and  induction  on  k. 

1.3.12.    Prove  the  following. 


(the  signs  alternate); 
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Recursions 


Let's  explore  yet  another  approach  to  evaluating  the  binomial  coefficient  C{n,  k).  As  in  the  previous 

section,  let  S  =  {xi, . . . ,  We'll  think  of  C{n,  k)  as  counting  /c-subsets  of  S.  Either  the  element  .t„ 
is  in  our  subset  or  it  is  not.  The  cases  where  it  is  in  the  subset  are  all  formed  by  taking  the  various 
(fc  —  l)-subsets  of  5  —  {x„}  and  adding  Xn  to  them.  The  cases  where  it  is  not  in  the  subset  are 
all  formed  by  taking  the  various  A:-subsets  of  S*  —  {x„}.  What  we've  done  is  describe  how  to  build 
A;-subsets  of  S  from  certain  subsets  of  5  —  {x„}.  Since  this  gives  each  subset  exactly  once, 

n\      /n  —  1\      /n— 1 
k)  ^  \k-l)  ^  \  k 

by  the  Rule  of  Sum. 

The  equation  C{n,  k)  =  C{n  —  1,  fc  —  1)  +  C{n  —  1,  k)  is  called  a  recursion  because  it  tells  how 
to  compute  C(n,  k)  from  values  of  the  function  with  smaller  arguments.  This  is  a  common  approach 
which  we  can  state  in  general  form  as  follows. 


Technique.  Deriving  recursions  Answering  the  question  "How  can  I  construct  the  things 
I  want  to  count  by  using  the  same  type  of  things  of  a  smaller  size?"  usually  gives  a  recursion. 

Sometimes  it  is  easier  to  answer  the  question  "How  can  I  break  the  things  I  want  to  count  up 
into  smaller  things  of  the  same  type?"  This  usually  gives  a  recursion  when  it  is  turned  around 
to  answer  the  previous  question. 


Let's  see  how  the  second  approach  works  for  subsets.  Given  our  collection  of  fc-element  subsets  of 
S,  throw  out  Xn  if  Xn  is  present.  We  obtain  some  {k  —  l)-element  subsets  of  5  —  {xn}  and  some 
fc-element  subsets  of  5*  —  {a;„}.  In  fact,  you  should  be  able  to  see  that  we  obtain  all  {k  —  l)-element 
subsets  and  all  fc-element  subsets  exactly  once.  Turning  this  around  gives  us  a  way  to  build  up 
/c-element  subsets  of  S. 

We  can  use  a  recursion  to  compute  a  table  of  values  by  starting  at  the  first  row  and  computing 
new  entries  by  adding  previous  ones.  The  arrows  in  Figure  1.3  show  how  this  is  done  for  the  binomial 
coefficients.  If  the  labels  in  this  table  are  dropped,  the  rows  are  shifted  slightly  and  a  single  1  is 
added  to  the  top  row,  then  we  obtain  what  is  called  Pascal's  triangle.  (See  the  figure.) 

Actually,  we've  cheated  a  bit  in  all  of  this  because  the  recursion  only  works  when  we  have  some 
values  to  start  with.  The  correct  statement  of  the  recursion  is  either 

C(0,0)  =  1, 

C(0,  fc)  0    for  fc  7^  0  and 

C{n,k)  =  C{n-l,k-l)  +  C{n-l,k)    for  n  >  0; 

or 

C(1,0)  =  C(l,l)  =  1, 

C{l,k)  =  0    for  fc  7^  0,1  and 

C(n,fc)  =  C{n-l,k-l)  +  C{n-l,k)    forn  >  1; 

depending  on  whether  we  want  to  start  with  the  row  of  Pascal's  triangle  consisting  of  1  alone  or  the 
row  consisting  of  1,1.  These  starting  values  are  called  initial  conditions.  Note  that,  in  either  case, 
the  last  two  conditions  guarantee  that  C{n,  k)  =0  for  all  fc  <  0. 
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Values  of  k 


0 

1 

2 

3 

4 

5 

6 

0 

1 

0 

0 

0 

0 

0 

0 

V 

1 

1 

1 

0 

0 

0 

0 

0 

a 

i  ^ 

\  i  ^ 

\  i 

1 

2 

1 

2 

1 

0 

0 

0 

0 

u 

i  ^ 

\  i  ^ 

\  i 

e 

3 

1 

3 

3 

1 

0 

0 

0 

s 

s,  i  ^ 

■n  i 

4 

1 

4 

6 

4 

1 

0 

0 

o 

\  i  ^ 

\  i 

f 

5 

1 

5 

10 

10 

5 

1 

0 

i  ^ 

\  i  ^ 

n 

6 

1 

6 

15 

20 

15 

6 

1 

1 

1  1 

12  1 

13     3  1 
1      4      6      4  1 
1      5     10    10     5  1 
1     6     15    20    15     6  1 


Figure  1.3  Left:  The  binomial  coefficients  are  computed  recursively.  The  columns  of  zeroes  for  A;  <  0  are 
omitted.     Right:  The  results  are  arranged  to  give  Pascal's  triangle. 


Example  1.26  Alternating  subsets  Let  t„  be  the  number  of  subsets  of  {1,  2, ...  ,n}  such  that, 
when  the  elements  of  the  subset  are  hsted  in  increasing  order,  the  first  is  odd,  the  second  is  even, 
the  third  is  odd,  and  so  forth.  We  will  allow  the  empty  subset.  Thus  to  =  1  and  ti  =  2  because  of  0 
and  {1}.  When  n  =  4  the  subsets  are 

0  {1}  {1,2}  {1,2,3}  {1,2,3,4}  {1,4}  {3}  {3,4}, 

and  so  i4  =  8.  Throwing  out  the  subsets  containing  4,  we  see  that  =  5.  Throwing  out  those 
containing  3  or  4,  we  see  that  t2  =  3. 

How  can  we  get  a  recursion  for  f„?  We  can't  simply  take  an  acceptable  subset  for  n  —  1  and 
either  add  n  to  it  or  not.  Why?  For  example,  adding  4  to  the  subset  {1,2}  counted  by  would  give 
{1, 2, 4},  which  is  not  allowed.  Of  course,  not  adding  an  element  is  always  safe.  In  other  words,  every 
subset  counted  by  tn  that  does  not  contain  n  is  counted  by  tn-i  and  conversely.  If  we  can  somehow 
figure  a  way  to  get  the  subsets  counted  by  t„  that  contain  n,  we'll  be  done. 

Let's  look  again  at  the  subsets  for  t4  that  contain  4.  There  are  three  of  them: 

{1,2,3,4},    {1,4}  and  {3,4}. 

Can  wc  somehow  reduce  this  to  one  or  more  groups  of  alternating  subsets  with  n  <  4?  Since  t2  —  3, 
this  might  be  a  good  place  to  start.  To  reduce  our  list  to  subsets  counted  by  t2,  we'll  need  to  throw 
out  3  and  4: 

{1,2},    {1}  and  0. 

We've  got  the  subsets  counted  by  t2-  That's  good,  but  can  we  reverse  the  process?  Yes.  Add  4  to  each 
subset.  If  the  resulting  subset  is  not  alternating,  add  3  to  it,  too,  and  the  result  will  be  alternating. 
That's  it! 

Let's  state  it  in  general.  We  build  the  subsets  counted  by  t„  in  two  ways. 

(a)  Take  a  subset  counted  by  tn-i- 

(b)  Take  a  subset  S  counted  by  tn-2-  Exactly  one  of  the  following  will  give  a  new  alternating  subset 

(i)  Add  n  to  S. 

(ii)  Add  n  —  1  and  n  to  S. 

In  fact,  if  the  largest  element  of  S  and  n  have  the  same  parity  (i.e.,  both  odd  or  both  even),  we 
use  (ii);  if  different  parity,  we  use  (i). 
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This  requires  n  >  2  since  we  need  to  have  n  —  1  >  0  and  n  —  2  >  0  for  (a)  and  (b)  to  make  sense. 
You  should  be  able  to  see  that  the  procedure  gives  every  alternating  subset  of  {1, 2, . . . ,  n}  exactly 
once.  We've  proved  that 

ao  =  1,    ai  =  2    and,  for  n  >  2,    a„  =  fln-i  +  an-2- 
These  are  the  Fibonacci  numbers,  which  can  be  found  in  the  index.  □ 

Example  1.27  Set  partitions  A  partition  of  a  set  B  is  a  collection  of  nonempty  subsets  of  B 
such  that  each  element  of  B  appears  in  exactly  one  subset.  Each  subset  is  called  a  block  of  the 
partition.  The  15  partitions  of  {1, 2,  3, 4}  by  number  of  blocks  are 

1  block:  {1,2,3,4} 

2  blocks:    {{1,2, 3},  {4}}      {{1,  2, 4},  {3}}  {{1,  2},  {3, 4}}      {{1, 3, 4},  {2}} 

{{1,3},{2,4}}      {{1,4},{2,3}}  {{1},{2,3,4}} 

3  blocks:    {{1,2},{3},{4}}  {{1,  3},  {2},  {4}}  {{1, 4},  {2},  {3}}  {{1},  {2, 3},  {4}} 

{{1},{2,4},{3}}  {{1},{2},{3,4}} 

4  blocks:  {{1},{2},{3},{4}} 

Let  S{n,  k)  be  the  number  of  partitions  of  an  n-set  having  exactly  k  blocks.  These  are  called  Stirling 
numbers  of  the  second  kind. 

Do  not  confuse  S{n,  k)  with  C{n,  k)  =  (^).  In  both  cases  we  have  an  ?>set.  For  C(n,  k)  we 
want  to  choose  a  subset  containing  k  elements  and  for  S{n,  k)  we  want  to  partition  the  set 
into  k  blocks. 

What  is  the  value  of  5(n,  fc)?  Let's  try  to  get  a  recursion  using  the  two  questions  in  our  technique. 

How  can  wc  build  partitions  of  S  =  {1,2,  ...,n}  with  k  blocks  out  of  smaller  cases?  Using 
the  approach  we  used  for  binomial  coefficients,  we'll  take  a  partition  of  5  —  {n}  and  add  n  to  it 
somehow  to  get  a  fc-block  partition  of  5.  If  we  take  partitions  of  {1,2,.. .,n— 1}  with  k—1  blocks, 
wc  can  simply  add  the  block  {n\.  If  wc  take  partitions  of  {1,2, . . .  ,n  —  1}  with  k  blocks,  wc  can 
add  the  element  n  to  one  of  the  k  blocks.  You  should  convince  yourself  that  all  k  block  partitions  of 
{1,2,...,  n}  arise  in  exactly  one  way  when  we  do  this.  This  gives  us  a  recursion  for  S{n,  k).  Putting 
n  in  a  block  by  itself  contributes  S{n  —  1 ,  A:  —  1) .  Putting  n  in  a  block  with  other  elements  contributes 
S{n  —  1,  fc)  X  fc  by  the  Rule  of  Product.  By  the  Rule  of  Sum 

S{n,k)  =  S{n-l,k-l)  +  kS{n-l,k).  1.9 

We  leave  it  to  you  to  determine  the  values  of  n  and  k  for  which  this  is  valid  and  to  determine  the 
initial  conditions.  You  can  construct  the  analog  of  Figure  1.3  as  an  exercise. 

Now  let's  take  the  second  question  approach:  How  can  we  tear  down  a  set  partition  into  some- 
thing smaller.  As  we  did  with  subsets,  we  can  simply  remove  n  from  our  partition  of  {1, 2, . . . ,  n}. 
You  should  convince  yourself  that  this  gives  (1.9).  There  is  another  approach  to  tearing  down:  In- 
stead of  simply  throwing  out  n,  we  can  throw  out  the  entire  block  containing  n.  If  there  are  j 
elements  in  that  block,  throwing  it  out  gives  us  a  partition  of  an  (n  —  j)-subset  of  {1,  2, . . . ,  n  —  1} 
into  k  —  1  blocks.  This  gives  all  such  partitions  exactly  once.  Since  there  are  (^ij)  ways  to  choose 
the  subset,  we  have 

5(n,  fc)  =  ^      ~     ^(n  -  i.  A;  -  1)    for  fc  >  1.  1.10 

The  initial  conditions  are  S{n,  1)  =  1  for  n  >  1  and  S{n,  1)  =  0  for  n  <  0. 

At  this  point  you  may  well  expect  us  to  come  up  with  an  explicit  formula  for  S{n,  k)  by  a  direct 
counting  argument  or  a  generating  function  argument  since  we  did  both  for  C(n,  k).  These  can  both 
be  done;  however,  more  tools  are  required.  They  are  developed  in  later  chapters.  Explicit  formulas 
for  S{n,  k)  are  not  as  nice  as  C{n,  k)  =  (2-ky.  since  the  simplest  formula  for  S{n,  k)  involves 
summation.  □ 
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So  far  all  we've  done  is  find  recursions  for  various  numbers  and  use  the  recursions  to  construct 
values.  This  is  not  the  only  way  recursions  can  be  used.  Here  are  some  others: 

•  Prove  a  formula,  usually  by  induction:  We'll  see  an  example  of  this  in  a  minute. 

•  Discover  that  two  sets  of  numbers  are  the  same  because  they  have  the  same  recursion.  (Remember 
to  include  the  initial  conditions!) 

•  Study  the  numbers  by  looking  directly  at  the  recursion  or  by  using  generating  functions:  More 

on  this  in  Part  IV. 

To  illustrate  a  proof  by  induction,  let's  do  Exercise  1.3.12(b),  namely  Y^^=o  (fc)  =  2"  when 
n  >  0.  It's  easy  to  check  it  for  n  =  0.  Suppose  n  >  0  and  the  result  is  true  for  all  values  less  than  n. 
By  the  recursion 


Since  the  terms  {^_^)  and  ("„^)  are  zero,  each  of  the  last  two  sums  is  2"  ^  by  the  induction 
hypothesis  and  we  are  done  since  2"~^  +  2"~^  =  2". 


Exercises 


1.4.1.  Calculate  the  next  two  rows  in  Pascal's  Triangle. 

1.4.2.  Equation  (1.9)  gives  a  recursion  for  S{n,  k),  but  it  is  incomplete:  initial  conditions  and  the  values  of  n 
and  k  for  which  it  holds  were  omitted.  Determine  the  values  of  n  and  k  for  which  it  is  valid.  Determine 
the  initial  conditions.  Construct  a  table  of  values  for  S{n,  k)  up  through  n  =  5. 

1.4.3.  Derive  a  recursion  like  S{n,  k)  =  S{n  —  1,  fc  —  1)  +  kS{n  —  1,  fc)  for  ordered  fc-lists  without  repetitions 
that  can  be  made  from  an  n-set.  Derive  the  recursion  using  an  argument  like  that  for  S{n,  fc);  do  not 
get  the  recursion  using  the  formula  n!/(n  —  A;)!  that  we  found  earlier.  Since  "like"  is  rather  vague, 
there  can  be  more  than  one  solution  to  this  exercise. 

1.4.4.  Exercise  1.3.12(c)  you  were  asked  to  prove 

=0  forn>l. 

fe=o         ^  ^ 

Prove  it  by  induction  on  n  using  the  recursion  (^)  =  (^Zi)  +  ("fc^)- 

1.4.5.  For  n  >  0,  prove  the  following  formulas  for  S{n,k). 

S{n,n)  =  l       S{n,n-l)=r^\       S{n,l)  =  l       ^(n,  2)  =  (2"  -  2)/2 


1.4.6.  How  can  the  initial  conditions  be  set  up  to  make  (1.10)  true  for  n  >  1? 

1.4.7.  "Marking"  something  can  help  us  derive  a  recursion.  How  many  ways  can  we  construct  a  fc-subset  of 
{1,2,...,  n}  and  mark  an  element  in  the  subset?  You  can  do  this  in  two  ways: 

•  choose  the  subset  and  mark  the  element  or 

•  choose  the  marked  element  and  then  choose  the  rest  of  the  subset. 

By  counting  these  two  ways,  obtain  the  recursion  (^')  =  ^  (fcZi)  for  fc  >  0. 
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1.4.8.  Let  Bn  be  the  total  number  ol  partitions  of  an  n  clement  set.  Thus 

En  =  S{n,  0)  +  S{n,  1)  +  •  •  •  +  Sin,  n). 

(a)  Prove  that 

n 

Bn+1  =  J2 
i=0 

where  Bo  is  defined  to  be  1. 
Hint.  Construct  the  block  containing  n  +  1  and  then  construct  the  rest  of  the  partition. 

(b)  Calculate  Bn  for  n  <  5. 

*1.4.9.  Return  to  Exercise  1.2.13  (p.  18).  You  should  have  done  (a)  and  (d)  previously.  Now  you  should  be 
able  to  do  (b)  and  (c)  and  obtain  a  recursion  for  (e).  (Later,  we  will  see  how  to  use  the  "Principle  of 
Inclusion  and  Exclusion"  to  obtain  another  solution  for  (e).) 

1.4.10.  We  want  to  count  the  number  of  n  digit  sequences  that  have  no  adjacent  zeroes.  The  digits  must  be 

chosen  from  the  set  {0, 1, . . . ,  d  —  1}.  For  example,  with  d  =  3  and  n  =  4,  the  sequences  0,2,1,0  and 
2,1,2,2  are  valid  but  1,0,0,2  and  1,3,2,3  are  not.  Let  the  number  of  such  sequences  be  An-  (The  case 
d  =  2  is  called  the  Fibonacci  numbers.) 

(a)  EYom  an  n-sequence,  remove  the  last  digit  if  it  is  nonzero  and  the  last  two  digits  if  the  last  digit 
is  zero.  By  reversing  this  process,  describe  a  way  to  build  up  all  acceptable  sequences  by  adding 
elements  one  or  two  at  a  time. 

(b)  Use  (a)  to  obtain  a  recursion  of  the  form  An  =  aA„_i  +  bAn-2-  What  are  a  and  b?  For  what  n 
is  the  recursion  valid?  What  are  the  initial  conditions? 

(c)  Compute  An  for  n  <  5  when  d  =  10. 

1.5  Multisets 


Let  M(n,  k)  be  the  number  of  ways  to  choose  k  elements  from  an  n-set  when  repetition  is  allowed 
and  order  doesn't  matter.  Will  any  of  our  three  methods  for  handling  C{n,k)  work  for  M{n,k)l 
Let's  examine  them. 

•  Imposing  an  order:  The  critical  observation  for  our  first  method  was  that  an  unordered  list  can 
be  ordered  in  k\  ways.  This  is  not  true  if  repetitions  are  allowed.  To  see  this,  note  that  the 
extreme  case  of  k  repetitions  of  one  element  has  only  one  ordering. 

•  Using  a  recursion:  We  might  be  able  to  obtain  a  recursion,  but  we  would  still  be  faced  with  the 

problem  of  solving  it. 

•  Using  generating  functions:  To  use  the  generating  functions  we  have  to  allow  for  repetitions. 
This  can  be  done  very  easily:  Simply  replace  (1  +  x,)  in  Example  1.14  (p.  19)  with  the  infinite 
sum 

l  +  x^  +  xl  +  xl^ — , 

a  geometric  series  which  sums  to  (1  —  Xj)"^.  Why  does  this  replacement  work?  When  we  studied 
C(n,  k)  in  Example  1.14,  the  two  terms  in  the  factor  1  +  Xi  corresponded  to  not  choosing  the  ith 
clement  OR  choosing  it,  respectively.  Now  we  need  more  terms:  XiXi  for  when  the  ith  clement 
is  chosen  to  appear  twice  in  our  unordered  list,  XiXiXi  for  three  appearances,  and  so  forth.  The 
distributive  law  still  takes  care  of  producing  all  possible  combinations.  As  in  Example  1.14,  if 
we  replace  Xi  by  x  for  all  i,  the  coefficient  of  x^  will  be  the  number  of  multisets  of  size  k.  Thus 
M{n,k)  is  the  coefficient  of  x'^  in  (1  —  a;)~".  You  should  be  able  to  use  this  fact  and  Taylor's 
Theorem  to  obtain  M{n,  fc)  =  (n  +  A;  -  l)!/(n  -  1)!  A;!.  Thus 
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Theorem  1.8  Multiset  formula  The  number  of  k-multisets  that  can  be  made  from  an 
n-set  is 

,x      fn  +  k-l\ 

Wc  can  stop  here  since  we  have  the  answer;  however,  someone  with  an  inquiring  mind  is  hkely 
not  to.  Such  a  person  might  ask  "Why  is  M(n,  k)  the  number  of  ways  to  choose  a  k  element  subset  of 
an  n  —  1  +  fc  element  set?"  Here  "why"  means  an  explanation  that  proves  the  two  numbers  are  equal 
without  actually  counting.  Posing  and  answering  questions  like  this  improve  our  understanding  of  a 
topic  and  improve  our  abilities  to  use  the  tools.  We'll  give  one  answer  now.  Another  appears  in  the 
exercises. 

Given  a  fc-multiset  of  positive  integers,  list  them  in  nondecreasing  order,  ^  say  Oi  <  02  <  . . .  <  afc. 

For  each  i,  increase  cii  by  i  —  1  to  obtain  a  new  list.  The  new  list  consists  of  k  distinct  postivc  integers 
in  increasing  order.  This  sets  up  a  one-to-one  correspondence  between  multisets  of  positive  integers 
and  sets  of  positive  integers. 

What  do  the  fc-multisets  formed  from  {1,2,...,  n}  correspond  to?  Since  the  largest  element  in  the 
multiset  is  increased  by  fc  — 1,  each  such  multiset  corresponds  to  a  /c-subsct  ofr  =  {l,2,...,n+A;  — 1}. 
Conversely,  every  /c-subset  X  of  T  corresponds  to  such  a  multiset:  Simply  list  the  elements  of  X  in 
increasing  order  and  subtract  i  —  1  from  the  ith  element  for  each  i. 

We  have  proved  that  in  our  one-to-one  correspondence  the  multisets  counted  by  M{n,  k)  corre- 
spond to  the  sets  counted  by  C{n  -|-  A;  —  1,  k).  Thus,  these  two  numbers  must  be  equal. 

Example  1.28  Balls  in  boxes  We  are  given  4  labeled  boxes  each  of  which  can  hold  2  balls 
and  are  also  given  4  identical  red  balls  and  4  identical  green  balls.  How  many  ways  can  the  balls  be 
placed  in  the  boxes? 

This  is  not  a  problem  that  fits  into  our  multiset  model  easily,  although  it  can  be  made  to  fit. 

Nevertheless,  it  is  the  sort  of  problem  that  our  methods  work  for.  Indeed,  it  is  very  similar  to  the 
card  hand  problems.  We'll  look  at  it  as  if  we  hadn't  seen  those  problems  to  emphasize  the  need  to 
be  able  to  translate  problem  descriptions  without  needing  to  force  them  into  particular  frameworks. 

To  begin  with,  we  observe  that  once  the  red  balls  have  been  placed  into  boxes,  there  is  only  one 

way  to  place  the  green  balls.  (This  is  because  there  are  exactly  as  many  positions  available  as  there 
are  balls.)  Thus,  we  can  simply  focus  on  placement  of  the  red  balls.  Since  there  aren't  very  many 
ways  to  do  that,  we  could  simply  list  all  of  them.  There  is  another  approach  that  requires  less  work: 
First  do  the  problem  with  unlabeled  boxes  and  then  label  them.  The  unlabeled  solutions  are  simply 
partitions  of  the  number  4  into  4  parts,  with  zeroes  allowed  and  no  part  exceeding  2.  (A  partition 
of  a  number  is  an  unordered  list  that  sums  to  the  number.)  The  solutions  are 

1  +  1  +  04-1-1-1  +  2    and    0  +  0  +  2-^2. 

The  first  of  these  can  be  labeled  in  one  way.  The  second  can  be  labeled  in  12  ways:  choose  the  label 
for  the  empty  box  (4  ways)  and  then  the  label  for  the  box  containing  2  red  balls  (3  ways).  The  third 
solution  can  be  labeled  in  (2)  =  6  ways.  Thus,  there  are  1  +  12  +  6  =  19  solutions  to  the  original 
problem.  □ 

^  A  sequence  is  in  nondecreasing  order  if  the  elements  do  not  decrease  as  we  move  along  the 
sequence.  It  is  in  increasing  order  if  the  elements  increase.  Thus  —7,3,5,6  is  both  increasing  and 
nondecreasing,  3,4,4,6  is  nondecreasing  but  not  increasing,  and  3,5,6,4  is  neither. 
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Given  a  set  S,  forming  a  /c-subset  or  a  /e-multiset  from  S  are  two  extremes:  for  a  A:-subset,  no 
element  can  be  repeated;  for  a  fc-multiset  elements  can  be  repeated  as  much  as  desired  (as  long  as 

the  total  equals  k).  If  wc  want  something  between  the  extremes,  the  counting  is  more  difficult.  For 
example,  there's  no  simple  formula  for  the  number  of  A;-multisets  if  each  element  appears  at  most  j 
times  except  for  j  =  1  and  j  >  k. 


Exercises 


1.5.1.  How  many  multisets  can  be  formed  from  a  set  S  if  each  element  can  appear  at  most  j  times?  Your 
answer  should  be  a  simple  formula. 

1.5.2.  It  was  stated  in  the  preceding  paragraph  that  "there's  no  simple  formula  for  the  number  of  fc-multisets 
if  each  element  appears  at  most  j  times  except  for  j  =  1  and  j  >  k."  What  are  the  formulas  for  j  =  1 
and  j  >  kl 

1.5.3.  Without  using  the  formula  for  M{n,  k),  prove  that  M(n,  A;)  =  M{n  -  1,  A;)  +  M(n,  k  -  1).  What  are 
the  initial  conditions  for  this  recursion? 

1.5.4.  Prove  that  M(n,  k)  is  the  number  of  ways  to  place  k  indistinguishable  balls  into  n  boxes. 

Hint.  If  you  have  n  =  7  boxes  and  A;  =  8  balls,  the  list  1,1,1,2,4,4,4,7  can  be  interpreted  as  "Place 
three  balls  in  box  1,  one  ball  in  box  2,  three  balls  in  box  4  and  one  ball  in  box  7." 

1.5.5.  Imagine  {1,2,  ...,n  +  fc  —  1}  represented  as  points  on  a  line  in  the  usual  way.  Convert  n  —  1  of  the 
points  to  vertical  bars  and  convert  0  and  n  +  fc  to  vertical  bars.  Combine  this  with  the  previous 
problem  to  prove  that  M{n,  k)  =  C(n-\-k—  1,  n—  1).  This  gives  one  answer  the  the  question  of  "why" 

the  two  numbers  arc  equal. 

Hint.  Here  arc  examples  of  a  correspondence  with  5  balls  and  4  boxes 

0123456789       points       01234  5  6789 
I    •  •  I  •  •   I  •   I    I    conversion    |    •   |  •   |    |    •  •  •  | 
1         2      3     4       box  no.  12     3  4 

1.5.6.  Prove  that  the  number  of  unordered  fe-lists  made  from  n  different  items  and  using  each  item  at  most 
twice  is  the  coefficient  of  a;*^  in  (1  +  a;  +  a;^)".  Generalize  this. 

1.5.7.  Let  T{n,  k)  be  the  the  number  of  fc-multisets  made  from  n  different  items,  using  each  item  at  most 
twice  in  a  multiset.  Prove  that 

T{n,  k)  =  T{n  -l,k)+  T(n  -  1,  A:  -  1)  +  T{n  -  1,  A  -  2). 

Relate  this  problem  to  the  previous  exercise  and  generalize  it. 

1.5.8.  Prove  by  induction  on  n  and  k  that  the  number  of  fc-multisets  that  can  be  formed  from  an  n-set 
is  Let  the  answer  be  M{n,k).  To  start  the  induction,  verify  the  formula  for  M(1,A;)  and 
for  M{n,  1)  for  all  n  and  A;.  For  the  induction  step,  use  M(n,  A;  —  1)  and  M{n  —  1,  A;)  to  derive 

M{n,k). 

1.5.9.  Let  t)  be  the  number  of  ways  to  put  h  labeled  balls  into  t  labeled  tubes.  When  balls  are  put  into 
tubes  the  order  matters:  because  the  diameter  of  the  tube  is  only  slightly  larger  than  that  of  the 
balls,  the  balls  end  up  stacked  on  top  of  each  other  in  a  tube. 

(a)  Prove  by  induction  on  b  that  f{b,t)  =  t{t  +  1)  ■  ■  ■  {t  +  b  —  1).  (To  do  this,  you  will  first  need  a 
recursion  for  f{b,t).) 

Hint.  There  arc  at  least  two  ways  to  get  a  recursion  on  b:  (i)  insert  6—1  balls  and  then  the  last 
or  (ii)  insert  the  first  ball  and  then  the  remaining  6—1. 

(b)  Give  a  noninductive  combinatorial  proof  of  the  formula  for  f{b,t). 
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*1.5.10.  Let  f{n,  k)  be  the  number  of  ways  to  partition  an  n-set  into  k  nonempty  blocks  where  the  order  of 
the  entries  in  a  block  matters  but  the  order  of  the  blocks  does  not  matter. 

(a)  Prove  by  induction  that 


Hint.  An  argument  like  the  one  leading  to  (1.9)  can  be  used  for  the  induction  step, 
(b)  Give  a  noninductive  combinatorial  proof  of  the  formula  for  f{n,  k). 

Notes  and  References 


We  conclude  this  chapter  with  a  table  of  the  numbers  of  each  of  the  four  basic  types  of  lists;  i.e., 
ordered  and  unordered  with  repetitions  allowed  or  forbidden.  It  is  given  in  Figure  1.4.  We  are 
selecting  k  things  from  an  n-set.  The  rules  governing  the  selections  are  listed  at  the  top  and  the  left 
of  the  figure.  As  indicated  in  the  figure,  these  numbers  can  also  be  interpreted  in  terms  of  placing 
balls  into  labeled  boxes. 

The  concepts  in  the  Rules  of  Sum  and  Product  were  known  in  ancient  times.  Until  fairly  re- 
cently, combinatorics  has  been  synonymous  with  counting.  This  may  be  due  to  its  connections  with 
probability  theory.  You  can  learn  about  this  in  many  books,  but  it  is  hard  to  do  better  than  Feller's 
classic  text  [7].  We  will  focus  on  enumeration  problems  again  in  Chapters  4,  10  and  11. 

Enumeration  is  still  an  active  area  of  research  in  combinatorics.  Although  much  of  the  re- 
search uses  more  sophisticated  tools  (See  the  notes  to  Chapters  4,  10  and  11.),  some  current  re- 
search relies  only  on  clever  elementary  arguments.  You  may  be  able  to  find  some  papers  of  this 
sort  by  browsing  through  such  combinatorial  journals  as  the  Journal  of  Combinatorial  Theory,  Se- 
ries A,  the  European  Journal  of  Combinatorics,  and  The  Electronic  Journal  of  Combinatorics  (at 
http://www.coinbinatorics.org).  Unfortunately,  proofs  arc  often  given  rather  tersely  and  careless 
authors  sometimes  neglect  to  explain  terminology.  As  examples  of  short  papers  that  you  may  be  able 
to  read  now,  you  may  want  to  look  at  [6]  and  [10].  Lubell's  proof,  which  was  used  in  Example  1.22, 
appeared  in  [9]. 

Because  of  the  fundamental  importance  of  counting,  it  is  discussed  in  almost  every  text  whose 
title  refers  to  combinatorics  or  discrete  mathematics.  A  few  of  the  texts  with  material  around  the 
level  of  this  book  are  those  by  Biggs  [2;  Ch.3],  Bogart  [3;  Chs.l,  2],  Cohen  [4;  Chs.2,  4],  Stanton  and 
White  [12;  Ch.l]  and  Tucker  [13;  Ch.5].  More  advanced  treatments  can  be  found  in  the  books  by 
Comtet  [5],  Goulden  and  Jackson  [8]  and  Stanley  [11].  Anderson  [1]  starts  off  with  Lubell's  proof  of 
Sperner's  Theorem  (Example  1.22)  and  then  continues  with  other  topics  related  to  subsets  of  sets. 
His  text  is  an  example  of  the  breadth  of  combinatorics — it  does  not  discuss  enumeration  and  has 
practically  no  overlap  with  our  text. 

Many  papers  have  been  written  on  Catalan  numbers.  Stanley  [11,  v. 2]  lists  sixty-six  things 
counted  by  Catalan  numbers  in  his  Exercise  6.1.9  (pp. 219-229)  and  gives  a  partial  solution  to  the 
exercise  (pp. 256-265). 

Derivations  of  Stirling's  formula  (Theorem  1.5  (p.  12))  can  be  found  in  many  places,  including 
Feller's  text  [7;  II.9,  VII.2]. 
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Figure  1.4  The  four  basic  list  enumerators  for  fc-lists  made  from  n-sets.  They  can  also  be  interpreted  as 
placing  k  balls  (either  labeled  or  unlabeled)  into  n  labeled  boxes.  The  ball  and  box  interpretation  is  indicated 
parenthetically. 
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CHAPTER  2 

Functions 


Introduction 


Functions  play  a  fundamental  role  in  nearly  all  of  mathematics.  Combinatorics  is  no  exception. 

In  the  next  section  wc  review  the  basic  terminology  and  notation  for  functions.  Permutations  are 
special  functions  that  arise  in  a  variety  of  ways  in  combinatorics.  Besides  studying  them  for  their 
own  interest,  we'll  see  them  as  a  central  tool  in  counting  objects  with  symmetries  and  as  a  source 
of  examples  for  decision  trees  in  later  chapters.  Section  3  discusses  some  combinatorial  aspects  of 
functions  that  relate  to  some  of  the  material  in  the  previous  chapter.  We  conclude  the  chapter  with 
a  discussion  of  Boolean  functions.  They  form  the  mathematical  basis  of  most  computer  logic. 


2.1   Sonne  Basic  Ternninology 


Terminology  for  Sets 


Except  for  the  real  munbers  (M),  rational  numbers  (Q)  and  integers  (Z),  our  sets  are  normally  finite. 
The  set  of  the  first  n  positive  integers,  {1,  2, . . . ,  n}  will  be  denoted  by  n. 

Recall  that  |^|  is  the  number  of  elements  in  the  set  A.  When  it  is  convenient  to  do  so,  we'll  assume 
that  the  elements  of  a  set  A  have  been  linearly  ordered  and  denote  the  ordering  by  ai,  02, . . . ,  0|^|. 
Unless  clearly  stated  otherwise,  the  ordering  on  a  set  of  numbers  is  the  numerical  ordering.  For 
example,  the  ordering  on  n  is  1,  2,  3, . . . ,  n. 

If  A  and  B  are  sets,  we  write  A  —  B  ior  the  set  of  elements  in  A  that  are  not  in  B: 

A-B  =  {x  \  X  G  A  und  X  ^  B}. 

(This  is  also  written  A\B.) 

If  A  and  B  are  sets,  recall  from  the  previous  chapter  that  the  Cartesian  product  ^  x  B  is  the 
set  of  all  ordered  pairs  built  from  A  and  B: 

Ax  B  =  {  (o,  6)  I  a  e  ^  and  bG  B}. 

We  also  call  A  x  B  the  direct  product  of  A  and  B. 

If  ^4  =  jS  =  R,  the  real  numbers,  then  R  x  R,  written  is  frequently  interpreted  as  coordinates 
of  points  in  the  plane.  Two  points  are  the  same  if  and  only  if  they  have  the  same  coordinates, 
which  says  the  same  thing  as  our  definition  of  (a,  b)  =  (a',  b').  Recall  that  the  direct  product  can  be 
extended  to  any  number  of  sets.  How  can  R  x  R  x  R  =  R^  be  interpreted? 
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What  are  Functions? 


Definition  2.1  Function  If  A  and  B  are  sets,  a  function  from  A  to  B  is  a  rule  that  tells 
us  how  to  find  a  unique  b  G  B  for  each  a  &  A.  We  write  f:  A  ^  B  to  indicate  that  f  is  a  function 
from  A  to  B.  We  call  the  set  A  the  domain  of  f  and  the  set  B  the  codomain  of  f.  To  specify 
a  function  completely  you  must  give  its  domain,  codomain  and  rule.  The  set  of  all  functions 
from  A  to  B  is  written  B^,  for  a  reason  we  will  soon  explain.  Thus  f:A^B  and  f  G  B^  say 
the  same  thing. 

In  calculus  you  dealt  with  functions  whose  codomains  were  M  and  whose  domains  were  contained 

in  M;  for  example,  f{x)  =  l/{x'^  —  1)  is  a  fimction  from  M  —  {  —  1, 1}  to  M.  You  also  studied  fimctions 
of  functions!  The  derivative  is  a  function  whose  domain  is  all  differentiable  functions  and  whose 
codomain  is  all  functions.  If  we  wanted  to  use  functional  notation  we  could  write  D{f)  to  indicate 
the  function  that  the  derivative  associates  with  /.  Can  you  sec  how  to  think  of  the  integral  as  a 
function?  This  is  a  bit  tricky  because  of  the  constant  of  integration.  We  won't  pursue  it. 

Definition  2.2  One-line  notation  When  A  is  ordered,  a  function  can  be  written  in  one- 
line  notation  as  (/(oi),  /(a2), . . . ,  f{a\A\))-  Thus  we  can  think  of  function  as  an  element  of 
B  X  B  X  . . .  X  B,  where  there  are  \A\  copies  of  B.  Instead  of  writing  B^^^  to  indicate  the  set 
of  all  functions,  we  write  B^.  Writing  Sl^^l  is  incomplete  because  the  domain  is  not  specified. 
Instead,  only  its  size  is  given. 

Exannple  2.1    Using  the  notation    To  get  a  feeling  for  the  notation  used  to  specify  a  function, 

it  may  be  helpful  to  imagine  that  you  have  an  envelope  or  box  that  contains  a  function.  In  other 
words,  this  envelope  contains  all  the  information  needed  to  completely  describe  the  function.  Think 
about  what  you're  going  to  see  when  you  open  the  envelope. 

*       *       *       Stop  and  think  about  this!        *       *  * 

You  might  see 

P={a,b,c},       g-.P^i,       .9(a)  =3,    g{b)  =  1    and    g{c)  =  4. 

This  tells  you  that  the  name  of  the  function  is  <?,  the  domain  of  g  is  P,  which  is  {a,  b,  c},  and  the 
codomain  of  5  is  4  =  {1, 2, 3, 4}.  It  also  tells  you  the  values  in  4  that  g  assigns  to  each  of  the  values 
in  its  domain.  Someone  else  may  have  put 

g:{a,b,c}  ^  i,       ordering:  a, &, c,       5=  (3,1,4). 

in  the  envelope  instead.  This  describes  the  same  function  but  doesn't  give  a  name  for  the  domain. 
On  the  other  hand,  it  gives  an  order  on  the  domain  so  that  the  function  can  be  given  in  one  line 
form.  Can  you  describe  other  possible  envelopes  for  the  same  function? 

What  if  the  envelope  contained  only  g  =  (3, 1,4)?  You've  been  cheated!  You  must  know  the 
domain  of  g  in  order  to  known  what  g  is.  What  if  the  envelope  contained 

the  domain  of  5  is  {a,  6,  c},       ordering:  a,  6,  c,       g— {3,1,4)7 

We  haven't  specified  the  codomain  of  g,  but  is  it  necessary  since  we  know  the  values  of  the  function? 
Our  definition  included  the  requirement  that  the  codomain  be  specified,  so  this  is  not  a  complete 
definition.  On  the  other  hand,  we  frequently  only  need  to  know  which  vahujs  in  its  codomain  g 
actually  takes  on  (here  1,  3  and  4),  so  we'll  be  sloppy  in  such  cases  and  accept  this  as  if  it  were  a 
complete  specification.  Q 
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Example  2.2    Counting  functions    By  the  Rule  of  Product,  \B^\  =  We  can  represent 

a  subset  S'  of  A  by  a  unique  function  f:A^2  where  f{x)  =  1  ii  a  ^  S  and  /(x)  =  2  if  x  £  S. 
This  proves  that  there  are  21'^ I  such  subsets.  We  can  represent  a  list  of  k  elements  of  a  set  S  with 
repetition  allowed  by  a  unique  function  f:k^S.  In  this  representation,  the  list  corresponds  to  the 
function  written  in  one  line  notation.  (Recall  that  the  ordering  on  k  is  the  numerical  ordering.)  This 
proves  that  there  are  exactly  {S]''  such  hsts.  Q 

Definition  2.3  Types  of  functions  Let  f:  A  ^  B  be  a  function.  If  for  every  b  e  B  there 
is  an  a  £  A  such  that  f{a)  =  b,  then  f  is  called  a  surjection  (or  an  onto  function).  Another 
way  to  describe  a  surjection  is  to  say  that  it  takes  on  each  value  in  its  codomain  at  least  once. 

If  f{x)  =  f{y)  implies  x  =  y,  then  f  is  called  an  injection  (or  a  one-to-one  function). 
Another  way  to  describe  an  injection  is  to  say  that  it  takes  on  each  value  in  its  codomain  at 
most  once.  The  injections  in  S—  correspond  to  lists  without  repetitions. 

If  f  is  both  an  injection  and  a  surjection,  it  is  a  called  a  bijection.  The  bijections  of  A^  are 
called  the  permutations  of  A.  If  f:  A  ^  B  is  a  bijection,  we  may  talk  about  the  inverse  of  f, 
written  f^^,  which  reverses  what  f  does.  Thus  ,f^^:B  — >■  A  and  f~^{b)  is  that  unique  a  £  A  such 
that  f{a)  =  b.  Note  that  f{f-\b))  =  b  and  =  a.  Do  not  confuse  /"^  with  1/f.  For 

example,  if  f:R  ^  R  is  given  by  f{x)  =  x^  +  1,  then  l/f{x)  =  l/(x^  +  l)  and  f-^{y)  = 

Exannple  2.3  Using  the  notation  We'U  illustrate  the  ideas  in  the  previous  paragraph.  Let 
A  =  4,  B  =  {a,  b,  c,  d,  e}  and  /  =  {d,  c,  d,  a).  Since  the  value  d  is  taken  on  twice  by  /,  /  is  not  an 
injection.  Since  the  value  b  is  not  taken  on  by  /,  /  is  not  a  surjection.  (We  could  have  said  e  is  not 
taken  on,  instead.)  The  function  (b,d,c,e)  is  an  injection  since  there  are  no  repeats  in  the  list  of 
values  taken  on  by  the  function. 

Now  let  A  =  4,  B  =  {x,  y,  z}  and  g  =  {x,  y,  x,  z).  Since  every  element  of  B  appears  at  least  once 
in  the  list  of  values  taken  on,  /  is  a  surjection. 

Finally,  let  A  =  B  =  A  and  h  =  (3, 1,4,2).  The  function  is  both  an  injection  and  a  surjection. 
Hence,  it  is  a  bijection.  Since  the  domain  and  codomain  are  the  same  and  /  is  a  bijection,  it  is  a 
permutation  of  4.  The  inverse  of  /i  is  (2, 4, 1, 3).  Q 

Example  2.4  Two-line  notation  Since  one  line  notation  is  a  simple,  brief  way  to  specify 
functions,  we'll  use  it  frequently.  If  the  domain  is  not  a  set  of  numbers,  the  notation  is  poor  because 
we  must  first  pause  and  order  the  domain.  There  are  other  ways  to  write  functions  which  overcome 
this  problem.  For  example,  we  could  write  /(a)  =  4,  f{b)  =  3,  /(c)  =  4  and  f{d)  =  1.  This  could  be 
shortened  up  somewhat  to  a  — >  4,  6  — >  3,  c  ^  4  and  d  ^  1.  By  turning  each  of  these  sideways,  we 
can  shorten  it  even  more:  (4  3  4  For  obvious  reasons,  this  is  called  two-line  notation.  Since  x 
always  appears  directly  over  f{x),  there  is  no  need  to  order  the  domain;  in  fact,  we  need  not  even 
specify  the  domain  separately  since  it  is  given  by  the  top  line.  If  the  function  is  a  bijection,  its  inverse 
is  obtained  by  interchanging  the  top  and  bottom  lines. 

The  arrows  we  introduced  in  the  last  paragraph  can  be  used  to  help  visualize  different  properties 
of  functions.  Imagine  that  you've  listed  the  elements  of  the  domain  A  in  one  column  and  the  elements 
of  the  codomain  B  in  another  column  to  the  right  of  the  the  domain.  Draw  an  arrow  from  a  to  b  if 
/(a)  =  b.  Thus  the  heads  of  arrows  are  labeled  with  elements  of  B  and  the  tails  with  elements  of 
A.  Since  /  is  a  function,  no  two  arrows  have  the  same  tail.  If  /  is  an  injection,  no  two  arrows  have 
the  same  head.  If  /  is  a  surjection,  every  element  of  B  is  on  the  head  of  some  arrow.  You  should  be 
able  to  describe  the  situation  when  /  is  a  bijection.  Q 
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Exercises 


2.1.1.  This  exercise  lets  you  check  your  understanding  of  the  definitions.  In  each  case  below,  some  infor- 
mation about  a  function  is  given  to  you.  Answer  the  following  questions  and  give  reasons  for  your 
answers:  (The  answers  are  given  at  the  end  of  this  problem  set.) 

(i)  Have  you  been  given  enough  information  to  specify  the  function;  i.e.,  would  this  be  enough 
data  for  a  function  envelope? 

(ii)  Cam  you  tell  whether  or  not  the  function  is  an  injection?  a  surjection?  a  bijection?  If  so, 
what  is  it? 

(iii)  If  possible,  give  the  function  in  two  line  form. 

(a)  /G3^^.*.«.*},    /  =  (3,1,2,3). 

(b)  /€{9,*,^,*}^,     /  =  (*,9,*). 

(c)  /  G  #,     2^3,     1^4,     3  ^  2. 

2.1.2.  Let  A  and  B  be  finite  sets  and  f  :  A  ^  B.  Prove  the  following  claims.  Some  are  practically  restate- 
ments the  of  definitions,  some  require  a  few  steps. 

(a)  If  /  is  an  injection,  then  \A\  <  \B\. 

(b)  If  /  is  a  surjection,  then  \A\  >  \B\. 

(c)  If  /  is  a  bijection,  then  |A|  = 

(d)  If  \A\  =  \B\,  then  /  is  an  injection  if  and  only  if  it  is  a  surjection. 

(e)  If  \A\  =  \B\,  then  /  is  a  bijection  if  and  only  if  it  is  an  injection  or  it  is  a  surjection. 


Answers 

2.1.1.  (a)  We  know  the  domain  and  codomain  of  /.  By  Exercise  2,  /  cannot  be  an  injection.  Since  no  order  is 
given  for  the  domain,  the  attempt  to  specify  /  in  one-line  notation  is  meaningless.  If  the  attempt 
at  specification  makes  any  sense,  it  tells  us  that  /  is  a  surjection.  We  cannot  give  it  in  two  line 
form  since  we  don't  know  the  function. 

(b)  We  know  the  domain  and  codomain  of  /  and  the  domain  has  am  implicit  order.  Thus  the  one-line 
notation  specifies  /.  It  is  an  injection  but  not  a  surjection.  In  two  line  form  it  is  (  ^  i§  ^  )  • 

(c)  This  function  is  specified  and  is  an  injection.  In  one- line  notation  it  would  be  (4,3,2),  and,  in 
two  line  notation,  (432)- 
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2.2  Permutations 


Before  beginning  our  discussion,  we  need  the  notion  of  composition  of  functions.  Suppose  that  / 
and  g  are  two  functions  such  that  the  values  /  takes  on  are  contained  in  the  domain  of  g.  We  can 
write  this  as  f:  A  B  and  g:C  ^  D  where  /(a)  £  C  for  all  a  e  A.  We  define  the  composition  of  g 
and  /,  written  gf:  A  ^  D  hy  {gf){x)  =  g{f{x))  for  all  x  &  A.  The  notation  g  o  f  is  also  used  to 
denote  composition.  Suppose  that  /  and  g  are  given  in  two  line  notation  by 


f  _  (  P  q  r  s  \         ^  _  (  P  Q  R  S  T  u  v\ 

J        \P  RT  u)  f  \1352467y" 


Then  5/=  (Hie)- 

The  set  of  permutations  on  a  set  A  is  denoted  in  various  ways  in  the  literature.  Two  notations 
are  PER(A)  and  S{A).  Suppose  that  /  and  g  are  permutations  of  a  set  A.  Recall  that  a  permutation 
is  a  bijection  from  a  set  to  itself  and  so  it  makes  sense  to  talk  about  /^^  and  fg.  We  claim  that  fg 
and  are  also  permutations  of  A.  This  is  easy  to  see  if  you  write  the  permutations  in  two-line 
form  and  note  that  the  second  line  is  a  rearrangement  of  the  first  if  and  only  if  the  function  is  a 
permutation. 

Again  suppose  that  /  is  a  permutation.  Instead  of  /  o/  or  //  we  write  p.  Note  that  P{x)  is  not 
{f{x))^.  (In  fact,  if  multiplication  is  not  defined  in  A,  (/(x))^  has  no  meaning.)  We  could  compose 
three  copies  of  /.  The  result  is  written  In  general,  we  can  compose  k  copies  of  /  to  obtain  f'^.  A 
cautious  reader  may  be  concerned  that  /  o  (/  o  /)  may  not  be  the  same  as  (/  o  /)  o  /  .  They  are  equal. 
In  fact,  =  /'^  o      for  all  nonnegative  integers  k  and  m,  where  /°  is  defined  by  =  x  for 

all  X  in  the  domain.  This  is  based  on  the  "associative  law"  which  states  that  f  o{goh)  =  [f  o  g)  oh 
whenever  the  compositions  make  sense.  We'll  prove  these  results. 

To  prove  that  the  two  functions  are  equal,  it  suflaces  to  prove  that  they  take  on  the  same  values 
for  all  X  in  the  domain.  Let's  use  this  idea  for  f  o  {g  oh)  and  {f  o  g)  o  h.  We  have 

if  °  {9  °  f^)){x)  =  /((.9  °  ^)('^))  by  the  definition  of  o, 

=  f{g{h{x)))  by  the  definition  of  o. 

Similarly 

((/  °  9)  °  h){x)  =  if  o  g)[h{x))  by  the  definition  of  o, 

=  f{g{h{x)))  by  the  definition  of  o. 

More  generally,  one  can  use  this  approach  to  prove  by  induction  that  /i  o  /2  °  •  •  •  °  /n  is  well  defined. 
This  result  then  implies  that  =  /*=  o  f™.  Note  that  we  have  proved  that  the  associative  law 

for  any  three  functions  /,  g  and  h  for  which  the  domain  of  /  contains  the  values  taken  on  by  g  and 
the  domain  of  g  contains  the  values  taken  on  by  /i. 

Example  2.5   Using  two-line  and  composition  notations  Let  /  and  g  be  the  permutations 

J        V21453/  9  123451/' 

We  can  compute  fg  by  calculating  all  the  values.  This  can  be  done  fairly  easily  from  the  two  line 

form:  For  example,  {fg){l)  can  be  found  by  noting  that  the  image  of  1  under  5  is  2  and  the  image 
of  2  under  /  is  1.  Thus  {fg){l)  =  1.  You  should  be  able  to  verify  that 

fg={\lttl)    and  gf={llin)^f9 

and  that 

j-2_/1  2  3  4  5\       J-3_/1  2  3  4  5\  j       „5  _  j-6  _  /  1  2  3  4  5  \ 

-li253  4j     J   -I21345J     ana     g  -j  -  (1234  sJ- 
Note  that  it  is  easy  to  get  the  inverse,  simply  interchange  the  two  lines.  Thus 

=  (1  2  3  4  5)    ^liich  is  the  same  as  f-'  =  {HHl) , 
since  the  order  of  the  columns  in  two  line  form  does  not  matter.  Q 
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Let  /  be  a  permutation  of  the  set  A  and  let  n  =  \  A\.  If  x  €  A,  we  can  look  at  the  sequence 

x,f{x),f{f{x)),..  .,f''{x),...,  which  is  often  written  as  a;      f{x)      f{f{x))   >  f''{x)  ^ 

Since  the  codomain  of  /  has  n  elements,  this  sequence  will  contain  a  repeated  element  in  the  first 
n  +  1  entries.  Suppose  that  /''(.t)  is  the  first  sequence  entry  that  is  ever  repeated  and  that  f'^'^^x) 
is  the  first  time  that  it  is  repeated.  Apply  (/~^)*  to  both  sides  of  this  equality  to  obtain  x  =  f^{x) 
and  so,  in  fact,  s  =  0.  It  follows  that  the  sequence  cycles  through  a  pattern  of  length  p  forever  since 
F+\x)  =  f{fP{x))  =  fix),  F+^{x)  =  p{F{x))  =  p{x),  and  so  on.  We  call  {x,f{x),.. . ,  fP-\x)) 
the  cycle  containing  x  and  call  p  the  length  of  the  cycle.  If  a  cycle  has  length  p,  we  call  it  a  p-cycle. 
Cyclic  shifts  of  a  cycle  are  considered  the  same;  for  example,  if  (1,2,6,3)  is  the  cycle  containing  1  (as 
well  as  2,  3  and  6),  then  (2,6,3,1),  (6,3,1,2)  and  (3,1,2,6)  are  other  ways  of  writing  the  cycle. 

Example  2.6    Using  cycle  notation    Consider  the  permutation  /  =  (248159376)-  ^^^^^ 

1  ^  2  ^  4  ^  1,  the  cycle  containing  1  is  (1,2,4).  We  could  equally  well  write  it  (2,4,1)  or  (4,1,2); 
however,  (1,4,2)  is  different  since  it  corresponds  tol^4^2^1  The  usual  convention  is  to  list 
the  cycle  starting  with  its  smallest  element.  The  cycles  of  /  are  (1,2,4),  (3,8,7),  (5)  and  (6,9).  We 
write  /  in  cycle  form  as 

/=  (1,2,4)  (3,8,7)  (5)  (6,9). 

It  is  common  practice  to  omit  the  cycles  of  length  one  and  write  /  —  (1,  2, 4)(3,  8,  7)(6,  9).  The 
inverse  of  /  is  obtained  by  reading  the  cycles  backwards  because  f~^{x)  is  the  lefthand  neighbor  of 
a;  in  a  cycle.  Thus 

/-I  =  (4,2,1)(7,8,3)(9,6)  =  (1, 4, 2)(3, 7, 8)(6, 9).  □ 

Cycle  form  is  useful  in  certain  aspects  of  the  branch  of  mathematics  called  "finite  group  theory." 
We  will  find  it  useful  later  when  we  study  the  problem  of  counting  structures  having  symmetries. 
Here's  an  application  now. 

Example  2.7  Powers  of  permutations  With  a  permutation  in  cycle  form,  its  very  easy  to 
calculate  a  power  of  the  permutation.  For  example,  suppose  we  want  the  tenth  power  of  the  permu- 
tation whose  cycle  form  (including  cycles  of  length  1)  is  (1,  5,  3)(7)(2,  6).  To  find  the  image  of  1,  we 
take  ten  steps:  1  ^  5  3  — >  1  •  •  •.  Where  does  it  stop  after  ten  steps?  Since  three  steps  bring  us 
back  to  where  we  started  (because  1  is  in  a  cycle  of  length  three),  nine  steps  take  us  around  the  cy- 
cle three  times  and  the  tenth  takes  us  to  5.  Thus  1  ^  5  in  the  tenth  power.  Similarly,  5^3  and 
3—^1.  Clearly  7^7  regardless  of  the  power.  Ten  steps  take  us  around  the  cycle  (2,6)  exactly  five 
times,  so  2  ^  2  and  6^6.  Thus  the  tenth  power  is  (1,  5,  3)(7)(2)(6). 

Suppose  we  have  a  permutation  in  cycle  form  whose  cycle  lengths  all  divide  k.  The  reasoning 
in  the  previous  paragraph  shows  that  the  fcth  power  of  that  permutation  will  be  the  identity;  that 
is,  all  the  cycles  will  be  1-long  and  so  every  element  is  mapped  to  itself.  In  particular,  if  we  are 
considering  permutations  of  an  n-set,  every  cycle  has  length  at  most  n  and  so  we  can  take  k  =  n\ 
regardless  of  the  permutation.  We  have  shown 

Theorem  2.1  Given  a  set  S,  there  is  a  k  depending  on  \S\  such  that  f'^  is  the  identity  map 
for  every  permutation  f  of  S. 

Without  cycle  notation,  it  would  be  harder  to  prove  the  theorem.  Q 
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Example  2.8  Average  cycle  information  Wliat  is  the  average  number  of  elements  fixed  by 
a  random  permutation?  More  generally,  what  is  the  average  number  that  belong  to  cycles  of  length 
fc?  What  is  the  average  number  of  cycles  in  a  random  permutation?  We'll  see  that  these  questions 
are  easy  to  answer. 

Make  the  n!  permutations  of  {1,  2, . . . ,  n}  into  a  probability  space  by  giving  each  permutation 
probability  1/n! — the  uniform  distribution.  We  want  to  define  random  variables  Xi,  one  for  each 
clement  of  n  such  that  their  sum  is  the  number  of  elements  in  fc-cycles.  Then  E(Xi  +  ■  •  •  +  X„)  is 
the  average  number  of  elements  that  belong  to  fc-cyclcs.  Lot 


Mi 


if  i  belongs  to  a  /c-cycle, 
0,  otherwise. 


Note  that  E(Xj)  is  the  same  for  all  i.  By  properties  of  expectation 

E(Xi  +  ---  +  X„)  =  E(Xi)  +  ---  +  E(X„)  =  nE(Xi). 

It  is  shown  in  Exercise  2.2.2  that  there  are  {n—l)\  permutations  with  Xi  =  l.  Since  each  permutation 

has  probability  1/n!,  nE(Xi)  =  n^"",^^'  =  1.  In  other  words,  on  average  one  element  belongs  to  a 
fc-cycle,  regardless  of  the  value  of  k  as  long  as  1  <  fc  <  n. 

We  now  consider  the  average  number  of  cycles  in  a  permutation.  To  do  this  we  want  to  define 
random  variables,  one  for  each  element  of  n,  such  that  their  sum  is  the  number  of  cycles.  Let 

1 


Yi  = 


length  of  the  cycle  containing  i ' 


Since  each  clement  of  a  fc-cycle  contributes  1/fc  to  the  sum  Yi  +  •  •  •  +  y„ ,  all  fc  elements  in  the  cycle 
contribute  a  total  of  1.  Thus  li  +  •  •  •  +  F„  is  the  number  of  cycles  in  the  permutation.  We  have 

E(Fi  +  ...  +  y„)  =  E(Fi)  +  ...+E(F„)  =  nE(Fi) 

n  ^ 

=  n      Pr(l  is  in  a  fc-cycle)  —  by  definition  of  E 

"  1  1 

=  n      —  -  by  Exercise  2.2.2 

^  n  k  ^ 

k=\ 
n  ^ 

The  last  sum  is  approximately  Inn  because  the  sum  is  approximately  1  +  / "  ^  =  1  +  Inn.  D 


*Example  2.9  Involutions  An  muo/wtion  is  a  permutation  which  is  equal  to  its  inverse.  Since 
f{x)  =  f~^{x),  we  have  f^{x)  =  f{f~^{x))  =  x.  Thus  involutions  are  those  permutations  which 
have  all  their  cycles  of  lengths  one  and  two.  How  many  involutions  are  there  on  n? 

Let's  count  the  involutions  with  exactly  fc  2-cycles  and  use  the  Rule  of  Sum  to  add  up  the  results. 
We  can  build  such  an  involution  as  follows: 

•  Select  2fc  elements  for  the  2-cycles  AND 

•  partition  these  2fc  elements  into  k  blocks  that  are  all  of  size  2  AND 

•  put  the  remaining  n  —  2k  elements  into  1-cycles. 

Since  there  is  just  one  2-cycle  on  two  given  elements,  we  can  interpret  each  block  as  2-cycle.  This 
specifies  /.  The  number  of  ways  to  carry  out  the  first  step  is  {^^^^ .  The  next  step  is  trickier.  A  first 
guess  might  be  simply  the  multinomial  coefficient  (2  ^'^2)  ~  (2fc)!/2'^.  This  leads  to  the  dilemma  of 
the  poker  hand  with  two  pairs  (Example  1.16):  We're  assuming  an  ordering  on  the  pairs  even  though 
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they  don't  have  one.  For  example,  with  k  =  3  and  the  set  6,  there  are  just  15  possible  partitions  as 
follows. 

{{1,2},  {3, 4},  {5, 6}}  {{1,2},  {3, 5},  {4, 6}}  {{1, 2},  {3, 6},  {4, 5}} 

{{1,3},{2,4},{5,6}}  {{1,3},  {2, 5},  {4, 6}}  {{1, 3},  {2, 6},  {4, 5}} 

{{1,4},{2,3},{5,6}}  {{1,4},{2,5},{3,6}}  {{L  4),  {2, 6),  {3, 5}} 

{{1,5},{2,3},{4,6}}  {{1,5},{2,4},{3,6}}  {{1,  5},  {2,  6),  {3, 4}} 

{{1,6},{2,3},{4,5}}  {{1,6},{2,4},{3,5}}  {{1, 6),  {2, 5),  {3, 4}} 

This  is  smaller  than  (222)  ~  6!/2!2!2!  =  90  because  all  3!  ways  to  order  the  three  blocks  in  each 
partition  are  counted  differently.  This  is  because  we've  chosen  a  first,  second  and  third  block  instead 
of  simply  dividing  6  into  three  blocks  of  size  two. 

How  can  we  solve  the  dilemma?  Actually,  the  discussion  of  what  went  wrong  contains  the  key 
to  the  solution:  The  multinomial  coefficient  counts  ordered  collections  of  k  blocks  and  we  want 
unordered  collections.  Since  the  blocks  in  a  partition  arc  all  distinct,  there  are  fc!  ways  to  order  the 
blocks  and  so  the  multinomial  coefficient  counts  each  unordered  collection  k\  times.  Thus  we  must 
simply  divide  the  multinomial  coefficient  by  fc!. 

If  this  dividing  by  fc!  bothers  you,  try  looking  at  it  this  way.  Let  f{k)  be  the  number  of  ways  to 
carry  out  Step  2.  Since  the  k  blocks  can  be  permuted  in  fc!  ways,  the  Rule  of  Product  tells  us  that 
there  are  f{k)  k\  ways  to  select  k  ordered  blocks  of  2  elements  each.  Thus  f{k)  k\  =  (2  ^'^  2)- 

Since  there  is  just  one  way  to  carry  out  Step  3,  the  Rule  of  Product  tells  us  that  the  number  of 
involutions  is 

1  f    2k    \  n!  1  (2A;)! 


n 


^2kJ  k\  \2,...,2j        {2k)\{n-2k)\  k\  (2!)'= ' 
Simplifying  and  using  the  Rule  of  Sum  to  combine  the  various  possible  values  of  fc,  we  obtain 


Theorem  2.2    The  number  of  involutions  ofn  is 

L"/2J 

E 


n! 


{n-2k)\2^k\' 


fc=0 


The  notation  \_x\  stands  for  the  largest  integer  in  x\  that  is,  the  largest  integer  m  <  a;.  It  is  also 
called  the  floor  of  x.  Q 

*Example  2.10    Permutation  matrices  and  parity   This  example  assumes  some  famiharity 
with  matrices  and  determinants. 

Suppose  /  and  g  arc  permutations  of  n.  We  can  define  an  n  x  n  matrix  F  to  consist  of  zeroes 
except  that  the  (i,j)th  entry,  Fi^j,  equals  one  whenever  f{j)  =  i.  Define  G  similarly.  Then 


{FG)i,j  =  Y^Fi^kGk 


fe=i 


■  i>s(j) ' 


since  Gk,j  =  0  except  when  g{j)  =  k.  By  the  definition  of  F,  this  entry  of  F  is  zero  unless  f{g{j))  =  i- 
Thus  {FG)ij  is  zero  unless  {fg){j)  =  i,  in  which  case  it  is  one.  We  have  proven  that  FG  corresponds 
to  fg.  In  other  words: 

Composition  of  permutations  corresponds  to  multiplication  of  matrices. 

It  is  also  easy  to  prove  that  corresponds  to  F~^.  For  example,  the  permutations  /  =  (1, 3, 2)(4) 
and  g  =  (1,  2,  3, 4),  written  in  cycle  form,  correspond  to 


/O    1  0  0\ 

0    0  10 

10  0  0 

Vo  0  0  1/ 


and  G 


/O    0    0  1\ 

10    0  0 

0    10  0 

Vo  0  1  0/ 


while  FG 


/I  0    0  0\ 

0  10  0 

0  0    0  1 

Vo  0    1  0/ 
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which  corresponds  to  the  cycle  form  (1)(2)(3,4),  which  equals  fg. 

Using  this  correspondence,  we  can  prove  things  such  as  {fg)~^  =  g~^f~^  and  {f'')~^  =  (/~^)'^ 
by  noting  that  they  are  true  for  matrices  F  and  G. 

Note  that,  since  the  matrix  F  contains  exactly  one  1  in  each  row  and  column,  its  determinant 
is  either  +1  or  —1. 

Definition  2.4  Parity  of  a  permutation  If  the  matrix  F  corresponds  to  the  permutation 
f,  we  call  detF  the  parity  of  /  and  write  it  x(/).  If  xif)  =  +1)  say  f  is  even  and,  if 
xif)  =  —  1,  we  say  that  f  is  odd. 


The  rest  of  this  example  is  devoted  to  proving: 


Theorem  2.3  Properties  of  parity  Suppose  f  and  g  are  permutations  of  n.  The  following 
are  true. 

(a)  xifg)  =  x(/)x(fl)- 

(b)  If  g  is  f  with  the  elements  ofn  relabeled,  then  xif)  =  xid)- 

(c)  If  f  is  written  a  product  of  (not  necessarily  disjoint)  cycles,  then  xif)  =  ("l)*^;  where  k  is 
the  number  of  cycles  in  the  product  that  have  even  length.  In  particular,  if  f  is  written  as 
a  product  ofk  2-cycles,  then  xif)  = 

(d)  For  n>  1,  exactly  half  the  permutations  of  n  are  even  and  exactly  half  are  odd. 

Let  F  and  G  be  the  matrices  corresponding  to  /  and  g.  Then  (a)  follows  from 
det(i^G)  =  dct(F)  dct(G'). 

In  the  next  paragraph,  we'll  show  that  a  relabelling  can  be  written  G  =  P*FP  where  P  is 
permutation  matrix  and  P*  is  the  transpose  of  P.  Take  determinants: 

Xig)  =  detG  =  det(P*FP)  =  det  P*  det  P  det  P. 

Now  det  P*  =  det  P  for  any  matrix  P  and  (det  P)^  =  det  P  since  the  determinant  of  a  permutation 

matrix  is  +1  or  —1.  Thus  we  have  =  detP  =  xif)- 

We  must  prove  the  relabeling  claim.  To  relabel,  we  must  permute  the  columns  of  F  in  some 
fashion  and  permute  the  rows  of  F  in  the  same  way.  You  should  convince  yourself  that  multiplying 
a  matrix  M  by  a  permutation  matrix  P  gives  a  matrix  MP  in  which  the  columns  of  M  have  been 
permuted.  Permuting  the  rows  of  F  is  the  same  as  permuting  the  columns  of  P*,  except  that  the 
resulting  matrix  is  transposed.  Thus  (P*P)*  permutes  the  rows  of  F  and  so  the  rows  and  columns 
are  permuted  by  [F^PyP  =  P^FP. 

li  f  —  ci  ■  ■  ■  Ck,  then  xif)  —  x(ci)  •  •  •  x(cfc)-  Thus  (c)  follows  if  we  know  the  parity  of  a  single 
cycle  c.  With  appropriate  relabeling,  the  matrix  corresponding  to  an  m-cycle  c  is  the  mxm  matrix 

/O    1    0    0    •••  0\ 
0    0    1    0    •••  0 
0    0    0    1    •••  0 


Gm  — 


0    0    0  0 

Vi  0  0  0 


1 

0/ 


Expanding  about  the  first  row,  we  have  det  =  —  detCm-i-  Since  detC2  =  —1,  it  follows  that 
det  Cm  —  (—1)™^^.  Thus  Cm  is  odd  if  and  only  if  m  is  even.  This  completes  the  proof  of  (c). 

Let  Po  and  Ve  be  the  sets  of  odd  and  even  permutations  of  n,  respectively.  Since  n  >  2, 
the  permutation  t  =  (1,2)  is  among  the  permutations.  Since  xit)  =  ~lj  it  follows  from  (c)  that 
To  :  f  ^  tf  is  map  from  Vo  to  Ve-  Since  is  the  identity  function,  T^.  :  g  ^  tg  is  the  inverse  of  To- 
You  should  be  able  to  see  that  this  implies  that  To  and  T^  are  bijections  and  so  Vo  =  T'e,  proving  (d). 

For  those  familiar  with  group  theory,  the  set  of  permutations  of  n  is  called  the  symmetric  group 
on  n  symbols  and  the  subset  of  even  permutations  is  called  the  alternating  group  on  n  symbols.  D 
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Exercises 


2.2.1.  This  exercise  lets  you  check  your  understanding  of  cycle  form.  The  answers  are  given  at  the  end  of 
this  problem  set.  A  permutation  is  given  in  one- line,  two- line  or  cycle  form.  Convert  it  to  the  other 
two  forms.  Give  its  inverse  in  all  three  forms.  (The  answers  are  given  at  the  end  of  this  problem  set.) 

(a)  (1,5,7,8)  (2,3)  (4)  (6). 

/v\  ('12345678\ 
^"^   V8  3  7  2  6  4  5  ir 

(c)  (5,4,3,2,1),  which  is  in  one-line  form. 

(d)  (5,4,3,2,1),  which  is  in  cycle  form. 

2.2.2.  Let  /  be  a  permutation  of  n.  The  cycle  of  /  that  contains  1  is  called  the  cycle  generated  by  1. 

(a)  Prove  that  the  number  of  permutations  in  which  the  cycle  generated  by  1  has  length  n  is 
(n-1)!. 

(b)  For  1  <  fc  <  n,  prove  that  the  number  of  permutations  in  which  the  cycle  generated  by  1  has 
length  k  is  (n  —  1)!,  independent  of  the  value  of  k.  (Remember  that  a  permutation  must  permute 
all  of  n.) 

(c)  Conclude  that  if  1  <  fc  <  n  and  a  permutation  of  n  is  selected  uniformly  at  random,  then  the 
probability  that  1  belongs  to  a  fc-cycle  is  1/n,  independent  of  k. 

2.2.3.  A  carnival  barker  has  four  cups  upside  down  in  a  row  in  front  of  him.  He  places  a  pea  under  the  cup 
in  the  first  position.  He  quickly  interchanges  the  cups  in  the  first  and  third  positions,  then  the  cups 
in  the  first  and  fourth  positions  and  then  the  cups  in  the  second  and  third  positions.  This  entire  set 

of  interchanges  is  done  a  total  of  five  times.  Where  is  the  pea? 

Hint.  Write  one  entire  set  of  interchanges  as  a  permutation  in  cycle  form. 

2.2.4.  Let  Pk{n)  be  the  number  of  permutation  of  n  all  of  whose  cycles  have  length  at  most  k.  Thus 
Pi(n)  =  1  since  only  the  identity  permutation  has  all  cycles  of  length  1.  Also,  P2{n)  is  the  number 
of  involutions.  For  later  convenience,  define  Pk(0)  ~  1- 

(a)  By  considering  the  cycle  containing  n  +  1,  prove  that  P2{n  +  1)  =  P2{n)  +  nP2{n  —  1)  for  n  >  0 

(b)  State  and  prove  a  similar  recursion  for  P3. 

2.2.5.  A  fixed  point  of  a  permutation  /  is  an  element  x  of  the  domain  such  that  f{x)  =  x.  A  derangement 
is  a  permutation  /  with  no  fixed  points;  i.e.,  f{x)  =^  x  for  all  x. 

(a)  Prove  that  the  probability  that  a  random  permutation  f  of  n  has  /(fc)  —  k  equals  1/n. 

(b)  If  we  treat  the  n  events  /(I)  ^  l,...,/(n)  n  as  independent,  what  is  the  probability  that 
/  is  a  derangement?  Conclude  that  we  might  expect  approximately  n\/e  derangements  of  n.  In 

Example  4.5  (p.  99),  you'll  see  that  this  heuristic  estimate  is  extremely  accurate. 

(c)  Let  Dn  be  the  number  of  derangements  of  n.  Prove  that  the  number  of  permutations  of  n  with 
exactly  fc  fixed  points  is  « 

2.2.6.  Let  z{n,  fc)  be  the  number  of  permutations  of  n  having  exactly  fc  cycles.  These  are  called  the  signless 
Stirling  numbers  of  the  first  kind.  Our  notation  is  not  standard.  The  notation  s(n,  fc)  is  commonly 
used  both  for  these  and  for  the  Stirling  numbers  of  the  first  kind,  which  may  differ  from  them  in 
sign. 

(a)  Prove  the  recursion 

z{n  +  1,  fc)  =  ^  (^^  il  z(n  -i,k-l). 
i=o 

(b)  Give  initial  conditions  and  the  range  of  n  and  fc  for  which  the  recursion  is  valid. 

(c)  Construct  a  table  of  z{n,  fc)  for  n  <  5.  Note:  You  can  obtain  a  partial  check  on  your  calculations 
by  using  J2k>0  ^("''  = 
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2.2.7.  We  want  to  compute  the  average  of  |oi  —  02I  +  |o2  —  03I  +  •  •  •  +  |oti-i  —  an\  over  all  permutations 
(ai, . . . ,  an)  of  n. 

(a)  Show  that  the  average  equals 

i<i<j<n 

2_i 

(b)  Show  that  the  answer  is  "  ^  . 


Answers 

2.2.1.  (a)  ill  II  T  III)  is  the  two  line  form  and  (5,3,2,4,7,6,8,1)  is  the  one-line  form.  (We'll  omit  the 
two-line  form  in  the  future  since  it  is  simply  the  one  line  form  with  1,  2, . . .  placed  above  it.)  The 
inverse  is  (1,8,7,5)  (2,3)  (4)  (6)  in  cycle  form  and  (8,3,2,4,1,6,5,7)  in  one-line  form. 

(b)  The  cycle  form  is  (1,8)  (2,3,7,5,6,4).  The  inverse  in  cycle  form  is  (1,8)  (2,4,6,5,7,3)  and  in  one-line 
form  is  (8,4,2,6,7,5,3,1). 

(c)  The  cycle  form  is  (1,5)  (2,4)  (3).  The  permutation  is  its  own  inverse. 

(d)  This  is  not  the  standard  form  for  cycle  form.  Standard  form  is  (1,5,4,3,2).  The  one-line  form  is 
(5,1,2,3,4).  The  inverse  is  (1,2,3,4,5)  in  cycle  form  and  (2,3,4,5,1)  in  one  line  form. 
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The  one-line  notation  for  a  function  is  simply  an  (ordered)  list,  so  there  is  a  simple  correspondence 
(i.e.,  bijection)  between  lists  and  functions:  A  fc-list  from  5  is  a  function  /:  fc  — >  -S.  If  /  is  an  injection, 
the  list  has  no  repeats. 

How  can  we  get  unordered  lists  to  correspond  to  functions?  (Recall  that  unordered  lists  corre- 
spond to  sets  or  multisets  depending  on  whether  repeats  are  forbidden  or  not.)  The  secret  is  a  two 
step  process.  First,  think  of  a  unique  way  to  order  the  list,  say  si,  S2,  •  •  • ,  Sk-  Second,  interpret  the 
resulting  list  as  a  one-line  function  as  done  above.  In  mathematics,  people  refer  to  a  unique  thing 
(or  process  or  whatever)  that  has  been  selected  as  canonical.  Thus  one  would  probably  speak  of  a 
canonical  ordering  of  the  list  rather  than  a  unique  ordering;  however,  both  terms  arc  correct. 

Let's  look  at  a  small  example.  Here's  a  listing  of  all  (^^3"^)  =  35  of  the  unordered  3-lists  with 
repetition  from  5.  In  the  listing,  an  entry  like  2,5,5  stands  for  the  3-list  containing  one  2  and  two  5's. 

1,1,1     1,1,2     1,1,3     1,1,4     1,1,5     1,2,2     1,2,3     1,2,4     1,2,5     1,3,3     1,3,4  1,3,5 

1.4.4  1,4,5     1,5,5     2,2,2     2,2,3     2,2,4     2,2,5     2,3,3     2,3,4     2,3,5     2,4,4  2,4,5 

2.5.5  3,3,3     3,3,4     3,3,5     3,4,4     3,4,5     3,5,5     4,4,4     4,4,5     4,5,5  5,5,5 

We've  simply  arranged  the  elements  in  each  of  the  3-lists  to  be  in  "nondecreasing  order."  Let 
61,  62, . . . ,  6fc  be  an  ordered  list.  We  say  the  list  is  in  nondecreasing  order  if  the  values  arc  not 
decreasing  as  we  move  from  one  element  to  the  next;  that  is,  if  61  <  &2  <  •  •  •  <  ^fc-  Wc  say  that  /  €  n- 
is  a  nondecreasing  function  if  its  one-line  form  is  nondecreasing;  that  is,  /(I)  <  /(2)  <  •  •  •  <  f{k). 
The  list  of  lists  we've  created  is  a  bijection  between  (i)  the  unordered  3-lists  with  repetition  from 
5  and  (ii)  the  nondecreasing  functions  in  5-  written  in  one-line  notation.  Thus,  3-multisets  of  5 
correspond  to  nondecreasing  functions. 
In  a  similar  fashion  we  say  that 

{nonincreasing  '\  r  61  >b2  >■  ■  ■  >bk', 

decreasing  >  order  if  <  bi  >b2  >■  ■  ■  >bk; 
increasing     J  K  bi  <b2  <■  ■  ■  <bk- 
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Again,  this  leads  to  similar  terminology  for  functions.  All  such  functions  are  also  called  m,onotone 
functions.  Some  people  say  "strictly  decreasing"  when  we  say  "decreasing,"  and  likewise  for  (strictly) 
increasing.  This  is  a  good  practice  because  it  helps  avoid  confusion,  therefore  we'll  usually  do  it. 
Also,  people  say  weakly  decreasing  instead  of  nonincrcasing,  a  convention  wc  often  adopt. 

We  can  interchange  the  strings  "decreas"  and  "increas"  in  the  previous  paragraphs  and  read  the 
functions  in  the  list  backwards.  Do  it.  In  summary: 


nonincreasing  =  weakly  decreasing 

decreasing  =  strictly  decreasing 

nondecreasing  =  weakly  increasing 

increasing  =  strictly  increasing 


>  means 


h  >b2  >■  ■  ■  >bk\ 
bi  >b2  >■  ■  ■  >bk; 
bi  <b2  <■  ■  ■  <bk; 
bi  <b2  <  ■  ■  <bk. 


In  the  bijection  we  gave  from  3-lists  to  functions,  the  3-lists  without  repetition  correspond  to  the 

strictly  increasing  fmictions.  For  example,  1,2,3  and  1,3,4  correspond  to  strictly  increasing  fimctions 
written  in  one-line  notation.  Thus  3-subsets  of  5  correspond  to  strictly  increasing  functions.  The 
bijection  between  3-subsets  of  5  and  strictly  decreasing  functions  is  given  by 

3,2,1   4,2,1   4,3,1   4,3,2   5,2,1   5,3,1   5,3,2   5,4,1   5,4,2  5,4,3. 

The  function  (4,  3, 1)  corresponds  to  the  3-subset  {4,  3, 1}  =  {1,  3, 4}  of  5. 
All  these  things  are  special  cases  of 


Theorem  2.4  There  is  a  bijection  between  unordered  k-lists  with  repetition  made  from 
n  and  the  weakly  increasing  (resp.  weakly  decreasing)  functions  in  n-.  In  this  correspondence, 
lists  without  repetition  (i.e.,  sets)  correspond  to  the  strictly  increasing  (resp.  strictly  decreasing) 
functions. 


Example  2.11   A  bijection  between  strict  and  nonstrict  functions    Let  m  =  n  +  k-l.  We 

will  construct  a  bijection  between  the  weakly  decreasing  functions  in  n-  and  the  strictly  decreasing 
functions  in  m-. 

Let  /  e  n-  be  a  weakly  decreasing  function.  Define  a  function  g  by  g{i)  =  f{i)  +  k  —  i.  This  is 
a  strictly  decreasing  function  because 

g{i)-g{i  +  l)   =  +  -  (/(i  +  l)  +  fc-(i  +  l))    =  l+(/(z)-/(z  +  l))    >  1. 

It  has  the  same  domain,  k,  as  /,  but  its  codomain  is  n  +  k  —  1  =  m  because  g(l)  is  fc  —  1  larger 
than  /(I)  and  /(I)  may  be  as  large  as  n.  Let's  give  this  map  from  weakly  decreasing  functions  in 
n—  to  strictly  decreasing  functions  in  n  —  k  +  1—  the  name  (p.  We  will  prove  that  (p  is  a  bijection. 

It  is  easy  to  see  that  /i  ^  f2  implies  that  fifi)  ^  >p{f2)  and  so  ip  is  an  injection.  We  claim  it  is 
a  bijection.  To  see  this,  suppose  that  h  £  m-  is  strictly  decreasing.  Define  /  by  /(i)  =  h{i)  —  {k  —  i). 
This  is  a  function  in  n-  with  (/?(/)  =  h.  To  complete  the  proof,  we  must  prove  that  it  is  weakly 
decreasing.  We  have 

fd)  -  f{i + 1)  =  (h{i)  -   + 1))  - 1  >  1-1  =  0. 

We  can  combine  our  knowledge  that  <^  is  a  bijection  with  the  Theorem  2.4  to  obtain  another 
proof  for  the  formula  for  the  number  of  fc-multisets  that  can  be  formed  from  n.  By  Theorem  2.4,  the 
number  of  /c-multisets  that  can  be  formed  from  n  is  the  same  as  the  number  of  weakly  decreasing 
functions  from  k  to  n.  By  ip,  the  latter  equals  the  number  of  strictly  decreasing  functions  from  k  to  m. 
By  the  theorem,  this  equals  the  number  of  /c-subsets  of  m.  Since  this  last  number  is  (™)  =  ("^^~^), 
there  are  that  many  fc-multisets  that  can  be  formed  from  n.  This  is  very  similar  to  a  proof  that  was 
given  after  Theorem  1.8  (p.  37).  D 
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Example  2.12  Unimodal  sequences  A  sequence  that  is  first  weakly  increasing  and  then 
weakly  decreasing  is  sometimes  called  unimodal  (a  term  which  is  not  strictly  correct).  The  weakly 
increasing  or  weakly  decreasing  part  may  be  empty.  Here  are  some  examples  of  unimodal  sequences: 

1,3,3,4,2       1,1,1,1       1,3,5,5,4,2,2  1,2,3,2,1. 

A  variety  of  counting  questions  can  be  asked  about  unimodal  sequences.  Generally  they  can  all  be 
handled  by  the  following  method: 

1.  Imagine  breaking  the  sequence  into  three  pieces: 

ai  <  012  <  •  •  •  <  ak-i  <  ak  >  flfe+i  >  afe+2  ■■■  >ae, 
^  V  '  ^  V  ' 

fe— 1  items  ^—k  items 

where  Uk  is  the  first  occurrence  of  the  largest  element  in  the  sequence. 

2.  Obtain  a  formula  for  the  number  of  such  sequences  based  on  the  value  of      and,  possibly, 
on  the  values  of  k  and  £. 

3.  Sum  the  result  of  the  previous  step. 

We'll  do  a  couple  of  examples. 

How  many  monotonic  sequences  are  there  in  which  the  inequalities  arc  all  strict  and  the  elements 
lie  in  n?  Suppose  the  largest  element  is  t.  The  sequence  elements  preceding  t  must  be  a  strict 
monotonic  sequence  with  all  elements  in  t  —  1.  Since  such  sequences  correspond  to  subsets  of  t—1 
there  arc  2*~^  choices  for  such  a  sequence.  Similarly,  there  arc  2*"-'^  choices  for  the  elements  after  t. 
Thus  there  are  (2*~^)^  strict  monotonic  sequences  of  positive  integers  with  largest  element  t.  Hence 
the  answer  to  the  original  question  is 

^(2*-i)2  =  1  +  4+16  +  ---  +  4"-!  =  i-^  =  1— 


How  many  weakly  monotonic  sequences  of  length  £  are  there  whose  elements  lie  in  n?  Let  Uk  =  t. 

•  The  weakly  increasing  sequence  preceding       corresponds  to  a  (fc  —  l)-multiset  formed  from 
t—1.  There  are  of  them.  We  have  to  be  careful  here  because  of  the  case  t  =  1. 

•  It  turns  out  to  be  easier  to  treat  the  case  t  =  1  separately.  There  is  only  one  weakly  monotonic 
sequence  of  length  £  with  largest  term  1,  namely  the  sequence  of  all  ones. 

•  The  weakly  decreasing  sequence  following      corresponds  to  a  (£  —  fc)-multiset  formed  from  t. 
There  are  (^+^1^"^)  of  them. 

Thus  the  number  of  weakly  monotonic  sequences  of  length  £  whose  elements  lie  in  n  is 

^_^^^/t  +  fc-3\/£  +  t-  fc-l 

t=2  k=l 


k-1   J\  £-k 
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Image  and  Coimage 


Again,  let  f:  A  ^  B  he  a  function.  The  image  of  /  is  the  set  of  values  /  actually  takes  on: 
Image(/)  =  {  /(a)  \  a  E  A}.  The  definition  of  a  surjection  can  be  rewritten  Image(/)  =  B* 

For  each  b  €  B,  the  inverse  image  of  b,  written  f~^{b)  is  the  set  of  those  elements  in  A  whose 
image  is  b;  i.e., 

/-i(6)  =  {a\a€Aandf{a)  =  b}. 

This  extends  our  earlier  definition  of  from  bijections  to  all  functions;  however,  such  an  can't 
be  thought  of  as  a  function  from  B  to  A  unless  /  is  a  bijection  because  it  will  not  give  a  unique  a  G  A 
for  each  b  E  B.  (There  is  a  slight  abuse  of  notation  here:  If  /:  ^  — >  B  is  a  bijection,  our  new  notation 

is  f^^{b)  =  {a}  and  our  old  notation  is  f^^{b)  =  a.) 

The  collection  of  nonempty  inverse  images  of  elements  of  B  is  called  the  coimage  of  /. 

We  claim  that  the  coimage  of  /  is  the  partition  of  A  whose  blocks  are  the  maximal  subsets  of  A 
on  which  /  is  constant.  For  example,  if  /  G  {a,  6,  c}-  is  given  in  one-line  form  as  (a,  c,  a,  a,  c),  then 

Coimage(/)  =  {r\a),r\c)}  =  {{1, 3, 4},  {2,  5}}, 

/  is  a  on  {1, 3, 4}  and  is  c  on  {2,  5}. 

We  now  prove  the  claim.  If  x  G  A,  let  y  =  ,f{x).  Then  x  S  ,f^^{y)  and  so  the  union  of  the 
nonempty  inverse  images  contains  A.  Clearly  it  does  not  contain  anything  which  is  not  in  A.  If 
2/1  7^  2/2,  then  we  cannot  have  x  e  f~^{yi)  and  x  G  f~^{y2)  because  this  would  imply  f{x)  =  y\  and 
f{x)  =  y2,  a,  contradiction.  Thus  Coimagc(/)  is  a  partition  of  A.  Clearly  xi  and  X2  belong  to  the 
same  block  if  and  only  if  f{xi)  =  f{x2)-  Hence  a  block  is  a  maximal  set  on  which  /  is  constant. 

Example  2.13  Blocks  and  Stirling  numbers  How  many  functions  in  B^  have  a  coimage  with 
exactly  k  blocks? 

This  means  that  the  coimage  is  a  partition  of  A  having  exactly  k  blocks.  Recall  that  5(n,  fc),  a 
Stirling  number  of  the  second  kind,  denotes  the  number  of  partitions  of  an  n-set  into  k  blocks.  (See 
Example  1.27  (p.  34).)  There  arc  S(\A\,k)  ways  to  choose  the  blocks  of  the  coimage.  The  partition 
of  A  does  not  fully  specify  a  function  /  e  B"^.  To  complete  the  specification,  we  must  specify  the 
image  of  the  elements  in  each  block,  in  other  words,  an  injection  from  the  set  of  blocks  to  B.  This  is 
an  ordered  selection  of  size  k  without  replacement  from  B.  There  arc  |i3|!/(|_B|  —  fc)!  such  injections, 
independent  of  which  k  block  partition  of  A  we  are  considering.  By  the  Rule  of  Product,  there  are 
S{\A\,k){\B\\l(\B\  -  fc)!)  functions  /  e       with  |  Coimage(/)|  =  k.  U 

We  can  describe  the  image  and  coimage  of  a  function  by  the  arrow  pictures  introduced  at  the 
end  of  Example  2.4.  Imagc(/)  is  the  set  of  those  b  E  B  which  appear  as  labels  of  arrowheads.  A  block 
in  Coimage(/)  is  the  set  of  labels  on  the  tails  of  those  arrows  that  all  have  their  heads  pointing  to 
the  same  value;  for  example,  the  block  of  Coimage(/)  arising  from  b  £  Image(/)  is  the  set  of  labels 
on  the  tails  of  those  arrows  pointing  to  b. 


*  The  image  is  also  called  the  range.  Unfortunately,  the  codomain  is  also  sometimes  called  the 
range. 
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Example  2.14  Blocks  of  given  sizes  Let  h  be  a  vector  of  nonnegative  integers  such  that 
X)fe>i  ^^fe  ~  I'^t  B{n,  b)  be  the  number  of  partitions  of  n  having  exactly  bk  blocks  of  size  k. 

We  call  b  the  type  of  a  partition  and  so  B{n,  b)  is  the  number  of  partitions  of  type  b. 

Consider  the  possible  coimages  of  functions  f:n^m.  Since  the  coimage  is  a  partition  of  n,  we 
can  talk  about  the  type  of  the  coimage,  too.  There  is  a  restriction:  Since  the  coimage  can't  have 
more  blocks  than  the  size  of  the  codomain,  bi  +  262  H  <  m. 

We  want  to  compute  B{n,b).  To  do  this,  we  first  partition  n  into  ordered  blocks  where  there 
are  kbk  elements  in  block  k  for  each  k.  Next,  for  each  k,  we  partition  block  k  into  bk  blocks  each 
of  size  k.  For  the  first  step,  all  we  need  is  a  multinomial  coefficient  since  that  step  is  precisely  the 
multinomial  coefficient  setup.  The  second  step  would  also  be  a  multinomial  coefficient  setup  if  we 
had  put  the  blocks  of  size  k  in  an  ordered  list.  Since  there  are  6^!  ways  to  order  the  bk  blocks  of  size 
k,  we  need  to  divide  that  multinomial  coefficient  by  6/c!.  This  gives  us 


6i,2&2,363,...;^^J-„6fc!^  fc,fc,...,fc;  "  \{bk\{k\f'' 

bk  copies  of  k  bky^O 


The  Pigeonhole  Principle 


The  Pigeonhole  Principle  is  a  method  for  obtaining  statements  of  the  form 

Un>g{d),  then^(n,d), 

where  ^  is  a  statement  and  g  is  some  function  depending  on  A.  For  example,  we'll  see  that,  if 
n  >  +  1,  then  any  sequence  of  n  distinct  numbers  contains  a  monotonic  subsequence  of  length  d. 
(What  follows  "then"  is  A{n,d).) 

Here  is  a  statement  of  the  principle  in  two  forms. 


Theorem  2.5    Pigeonhole  Principle     Function  form:    Suppose  A  and  B  are  sets  with 
\A\  >  \B\,  then  for  every  function  f  :  A  ^  B  there  is  a  b  e  B  with  |/"^(&)|  >  1. 
Partition  form:  Suppose  V  is  a  partition  of  the  set  A  into  less  than  \  A\  blocks.  Then  some  block 
contains  more  than  one  element  of  A. 


You  should  be  able  to  prove  this  theorem.  In  fact,  it  is  so  simple  we  should  probably  not  even  call 

it  a  theorem.  To  see  why  the  two  forms  of  the  theorem  are  equivalent,  first  suppose  f  :  A  ^  B.  Let 
V  be  the  coimage  of  /.  It  must  have  at  most  \B\  <  \A\  blocks.  Conversely  suppose  P  is  a  partition 
of  A.  Number  the  blocks  in  some  fashion  from  1  to  \V\.  Let  B  =  {1, . . . ,  \V\}  and  define  f  :  B 
by  letting  /(a)  be  the  number  of  the  block  that  contains  A. 

Where  did  the  rather  strange  name  "Pigeonhole  Principle"  come  from?  Old  style  desks  often 
had  what  looked  like  a  stacked  array  of  boxes  that  were  open  in  the  front.  These  boxes  were  usually 
used  to  hold  various  letters  and  folded  or  rolled  papers.  The  boxes  were  called  pigeon  holes  because 
of  their  resemblance  to  nesting  boxes  in  pigeon  coops.  If  \A\  letters  are  placed  in  |B|  pigeonholes  in 
a  desk  and  |^|  >  \B\,  then  at  least  one  pigeonhole  contains  at  least  two  documents. 
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Example  2.15  Subset  sums  Given  a  set  P  of  integers,  let  ^  P  be  the  sum  of  the  elements  of 
P.  We  say  that  a  set  S  of  positive  integers  has  the  two-sum  property  if  there  are  subsets  P  and  Q 
of  S  with  P  ^  Q  and  ^P  =  ^Q-  Not  all  sets  have  the  two-sum  property.  For  example,  {1, 2, 4, 8} 
does  not,  but  {1,2,4,5}  has  the  two-sum  property — take  P  =  {1,4}  and  Q  =  {5}.  The  set 
S  =  {1,  2, 4,  8}  fails  because  its  elements  grow  too  rapidly.  Can  we  put  some  condition  on  \S\  and 
max  5,  the  largest  element  of  S,  so  that  S  must  have  the  two  sum  property?  Since  n  can't  be  too 
large,  we  might  expect  a  statement  of  the  form 

If  l^l  >  k  and  max 5  <  h{k),  then  S  has  the  two-sum  property.  2.1 

If  a  subset  of  S  has  the  two-sum  property,  then  so  does  S.  Hence  it  suffices  to  prove  (2.1)  when 
\S\  =  k  since  any  S  with  |5|  >  k  can  replaced  by  a  subset  S'  of  size  k.  Thus  (2.1)  is  equivalent  to 

If  \S\  =  k  and  max 5  <  h{k),  then  S  has  the  two-sum  property. 

Since  we  want  to  look  for  repeated  values  of  subset  sums,  our  function  /  will  map  subsets  to 
their  sums.  Thus  A  =  2^,  the  subsets  of  5,  and  /  :  ^  ^  {0, 1, . . . ,     S}  is  defined  by  f{P)  =  E  -P- 

To  apply  the  pigeonhole  princdplc  we  need  the  cardinality  of  the  domain  and  codomain  of  /. 
We  have  \A\  =  2'^!  =  2*^.  Since  B  =  {0,1, . . .  ,J2  S},  we  have  \B\=J2S+l.  For  the  Pigeonhole 
Principle,  we  need  2*^  >  ^  5  -|- 1.  To  get  this,  we  need  an  upper  bound  for  J2  Let  max  S  =  n  and 
notice  that 

VS'  <  n  +  {n-l)  +  {n-2)-\  \-{n-k  +  l)  <  nk. 

^ — '         ^  ^  -f 

k  terms 

Thus  it  suSices  to  have  2*^  >  nk  +  1.  This  is  easily  solved  to  get  an  inequality  for  n.  Recalling  that 
n  =  max 5  and  k  =  l^l,  we  have  proved  that  any  set  S  of  positive  integers  has  the  two-sum  property 
if 

2l'5l  -  1 
max  5  <  — 7-^ — .  □ 

Example  2.16   Monotonic  subsequences  We  win  prove 

Theorem  2.6  Given  a  sequence  of  more  than  mn  distinct  numbers,  there  must  be  either  a 
subsequence  ofn+1  decreasing  numbers  or  a  subsequence  ofm+1  increasing  numbers. 

A  subsequence  need  not  be  consecutive;  for  example,  1,2,3  is  a  subsequence  of  7,1,6,4,2,4,3. 

The  theorem  is  best  possible  because  the  following  sequence  contains  nm  numbers,  the  longest 
decreasing  subsequence  has  length  n  and  the  longest  increasing  subsequence  has  length  m. 

n,  (n— 1),        1,    2n,  (2n— 1),  ....  n+1,    3n,  (3n— 1),       2n+l,    ....    mn,  (mn— 1),  ....  (m— l)n+l.  2.2 

How  can  wc  prove  the  theorem?  Here  is  a  fairly  natural  approach  which  turns  out  to  be  incorrect. 
We  could  try  to  "grow"  sequences  as  follows.  Let  ai,. .  .at  be  the  given  sequence.  Then  is  both 
an  increasing  and  a  decreasing  sequence  starting  at  £.  Wc  back  up  from  £  one  step  at  a  time  until  we 
reach  1.  Suppose  we  have  decreasing  and  increasing  sequences  starting  at  t.  If  at^i  <  at,  we  can  put 
at-i  at  the  front  of  our  increasing  sequence  and  increase  its  length  by  one.  If  at^i  >  at  we  can  put 
a(_i  at  the  front  of  our  decreasing  sequence  and  increase  its  length  by  one.  Each  step  increases  one 
length  or  the  other  by  1.  Thus,  when  we  reach  ai,  the  sum  of  the  lengths  is  the  initial  sum  plus  the 
number  of  steps:  2+  (£—1)  =  £+1.  Thus,  if  >  m  -|-  n,  we  have  either  an  increasing  subsequence 
longer  than  to  or  a  decreasing  one  longer  than  n.  This  can't  be  right — it's  better  than  the  claim  in 
the  theorem  and  (2.2)  tells  us  that's  best  possible.  Can  you  sec  what  is  wrong? 

*        *        *       Stop  and  think  about  this!        *        *  * 

We  must  compare  at-i  with  the  starts  of  the  increasing  and  decreasing  subsequences  that  we've 
built  so  far  and,  when  t  <  £  only  one  of  these  subsequences  begins  with  at. 
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Can  wc  salvage  anything  from  what  wc  did?  Suppose  no  decreasing  subseqiiencc  is  longer  than 
n  and  no  increasing  subsequence  is  longer  than  m.  Let  /(  be  the  length  of  the  longest  increasing 
subsequence  that  starts  with  at  and  let  Dt  be  the  length  of  the  longest  decreasing  subsequence  that 
starts  with  o,/.  Define  a  map  /  :  £  ^  to  x  n  by  f{t)  =  (It,  Di).  We'll  show  that  /  is  an  injection.  By 
the  Pigeonhole  Principle,  we  cannot  have  \A\  >  \B\.  In  other  words  £  <  m  x  n.  The  contrapositive 
is  the  theorem  because  it  says  that  ii  £  >  m  x  n,  then  it  is  not  true  that  both  (a)  no  increasing 
subsequence  exceeds  m  and  (b)  no  decreasing  subsequence  exceeds  n.  In  other  words,  ii  £  >  m  x  n, 
there  is  either  an  increasing  subsequence  longer  than  to  or  a  decreasing  subsequence  longer  than  n. 

It  remains  to  show  that  /  is  an  injection.  Suppose  i  <  j.  You  should  be  able  to  see  that 

If  Ui  <  Uj ,  we  have  Ii  >  Ij . 
If  a,  >  ttj,  we  have  Di  >  Dj. 

This  completes  the  proof.  Another  proof  is  given  in  the  exercises.  Q 
Exercises 


2.3.1.  This  exercise  lets  you  check  your  understanding  of  the  definitions.  In  each  case  below,  some  infor- 
mation about  a  function  is  given  to  you.  Answer  the  following  questions  and  give  reasons  for  your 
answers:  (The  answers  are  given  at  the  end  of  this  problem  set.) 

(i)  Have  you  been  given  enough  information  to  specify  the  function;  i.e.,  would  this  be  enough 

data  for  a  function  envelope? 

(ii)  Can  you  tell  whether  or  not  the  function  is  an  injection?  a  surjection?  a  bijection?  If  so. 


what 

is  it? 

(a) 

Coimage(/)  =  {{l,3,5},{2,4}}. 

(b) 

Coimage(/)  =  {{1},  {2},  {3},  {4},  {5}} 

(c) 

/e#, 

/-i(2)  =  {1,3,5},    /-i(4)  =  {2,4}}. 

(d) 

/e#, 

Image(/)  =  4. 

(e) 

/e#, 

Image(/)  =  5. 

(f) 

/e#, 

|Coimage(/)|  =  5. 

2.3.2.  Let  A  and  B  be  finite  sets  and  let  /:  j4  — >  S  be  a  function.  Prove  the  following  claims. 

(a)  |Image(/)|  =  |Coimage(/)|. 

(b)  /  is  an  injection  if  and  only  if  |  Image(/)|  =  \  A\. 

(c)  /  is  a  surjection  if  and  only  if  |  Coimage(/)|  =  \B\. 

2.3.3.  Let  6  be  a  vector  of  nonnegative  integers  such  that  X^fc>i  kb^  =  n,  let  C{n,b)  be  the  number  of 

permutations  of  n  having  exactly  6^.  cycles  of  length  k,  and  let  B{n,  b)  be  the  number  of  partitions 
of  n  having  exactly  b/^  blocks  of  size  k.  Prove  that 

C{n,b)  =  B{n,b)Y[{k-iy-'"'- 

k>l 

2.3.4.  How  many  strictly  unimodal  sequences  are  there  in  which  the  largest  element  is  in  the  exact  middle 
of  the  sequence  and  no  element  exceeds  n? 
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2.3.5.  How  can  wc  got  a  bijcction  between  partitions  of  a  set  and  a  reasonable  class  of  fnnctions?  Does  the 
notion  of  coimage  help?  What  about  partitions  with  exactly  k  blocks?  This  exercise  deals  with  these 
issues. 

(a)  Given  a  partition  of  n,  let  block  1  be  the  block  containing  1.  If  the  first  k  blocks  have  been 
defined,  let  block  fe  +  1  be  the  remaining  block  that  contains  the  smallest  remaining  element. 
This  numbers  the  blocks  of  a  set  partition.  Number  the  blocks  of  the  partition 

{3,9,2},  {4,1,6}  {5}  {7,8}. 


(b)  Given  a  partition  B  oi  a.  n,  associate  with  it  a  function  /  G  n—  as  follows.  Number  the  blocks  of 
B  as  above.  Let  f(i)  be  the  number  of  the  block  of  B  that  contains  i.  Prove  that  if  two  partitions 
are  different,  then  the  functions  associated  with  them  are  different. 

(c)  What  is  the  coimage  of  the  function  associated  with  a  partition  B  of  n?  (Phrase  your  answer  as 
simply  as  possible  in  terms  of  B.) 

(d)  Call  a  function  /  €  n—  a  restricted  growth  function  if  /(I)  =  1  and  f{i)  —  1  is  at  most  the 
meiximum  of  f{k)  over  all  k  <  i.  Which  of  the  following  functions  in  one-line  form  are  restricted 
growth  functions?  Give  reasons  for  your  answers. 

2,2,3,3    1,2,3,3,2,1    1,1,1,3,3  1,2,3,1. 


(e)  Prove  that  a  function  is  associated  with  a  partition  of  n  if  and  only  if  it  is  a  restricted  growth 
function.  (Since  you  already  proved  that  different  partitions  have  different  functions  associated 
with  them,  this  is  the  desired  bijection.) 

(f)  For  4,  list  in  lexicographic  order  all  restricted  growth  functions  and,  for  each  function,  give  the 
partition  of  4  that  corresponds  to  it. 

2.3.6.  Prove  the  generalized  Pigeonhole  Principle:  Suppose  that  a  set  S  is  partitioned  into  k  blocks.  Prove 
that  some  block  must  have  more  than  (\S\  —  l)/fc  elements. 

2.3.7.  Let  f  :  A  ^  B.  Use  the  generalized  Pigeonhole  Principle  to  obtain  a  lower  bound  on  the  size  of  the 
largest  block  in  the  coimage  of  /. 

2.3.8.  Suppose  n  +  1  numbers  are  selected  from  2n.  Prove  that  we  must  have  selected  two  numbers  such 
that  one  is  a  multiple  of  the  other. 

Hint.  Write  each  number  in  the  form  2'^m  where  m  is  odd.  Study  occurrences  of  values  of  m  using 
the  Pigeonhole  Principle. 

2.3.9.  Prove  Theorem  2.6  using  the  generalized  Pigeonhole  Principle  (Exercise  2.3.6)  as  follows.  Given 
ai,...,a^,  define  f  :  £  B  hy  letting  f{t)  be  the  length  of  the  longest  decreasing  subsequence 
starting  with  ot-  If  i  <  j  <  •  •  •  and  /(i)  =  f{j)  =  •  •  •,  look  at  the  subsequence  ai,aj,  

2.3.10.  Given  N,  how  large  must  t  be  so  that  every  set  S  containing  at  least  t  positive  integers  satisfies  the 
following  condition?  There  are  elements  a,b,c,d  G  S  such  that  (i)  a  +  b  and  c  +  d  have  the  same 
remainder  when  divided  by  N  and  (ii)  {o,  6}  ^  {c,  d}. 
Hint.  Look  at  remainders  when  sums  of  pairs  are  divided  by  N. 

*2.3.11.  Given  a  set  S  of  integers,  prove  that  the  elements  of  some  nonempty  subset  of  S  sum  to  a  multiple 
of  1^1 . 
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Answers 

2.3.1.  (a)  The  domain  and  codomain  of  /  are  specified  and  /  takes  on  exactly  two  distinct  values.  /  is  not 
an  injection.  Since  we  don't  know  the  values  /  takes,  /  is  not  completely  specified;  however,  it 
cannot  be  a  surjection  because  it  would  have  to  take  on  all  four  values  in  its  codomain. 

(b)  Since  each  block  in  the  coimage  has  just  one  element,  /  is  an  injection.  Since  |  Coimage(/)|  =  5  = 
Icodomain  of  /|,  /  is  a  surjection.  Thus  /  is  a  bijection  and,  since  the  codomain  and  domain  are 
the  same,  /  is  a  permutation.  In  spite  of  all  this,  we  don't  know  the  function;  for  example,  we 
don't  know  /(I),  but  only  that  it  differs  from  all  other  values  of  /. 

(c)  We  know  the  domain  and  codomain  of  /.  From  /~^(2)  and  /~^(4),  we  can  determine  the  values 
/  takes  on  the  union  /~^(2)  U  /~^(4)  =  5.  Thus  we  know  /  completely.  It  is  neither  a  surjection 
nor  an  injection. 

(d)  This  function  is  a  surjection,  cannot  be  an  injection  and  has  no  values  specified. 

(e)  This  specification  is  nonsense.  Since  the  image  is  a  subset  of  the  codomain,  it  cannot  have  more 

than  four  elements. 

(f)  This  specification  is  nonsense.  The  number  of  blocks  in  the  coimage  of  /  equals  the  number  of 
elements  in  the  image  of  /,  which  cannot  exceed  four. 
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A  Boolean  function  /  is  a  map  from  {0, 1}"  to  {0, 1}.  Thus  the  domain  of  /  is  all  n  long  vectors  of 
zeroes  and  ones.  Boolean  functions  arise  in  logic,  where  0  is  often  replaced  by  F  for  "False"  and  1 
by  T  for  "True."  Boolean  functions  also  arise  in  arithmetic,  where  0  and  1  arc  digits  of  numbers  in 
binary  representation.  Mathematically,  there  is  no  difference  between  these  interpretations;  however, 
the  two  different  interpretations  have  slightly  different  notation  associated  with  them. 

Example  2.17  Basic  Boolean  functions  Here  are  three  functions  from  {0,1}^  to  {0,1}  in 
two-line  form 

=    /  (0,0)  (04)  (1,0)  (1,1)  \  /  (0,0)  (0,1)  (1,0)  (1,1)  \  /  (0,0)  (0,1)  (1,0)  (1,1)  \ 

^       Voooiy  voiii/       •'  \oiioy' 

If  we  think  of  x  and  y  as  integers,  we  can  write  p{x,  y)  =  xy  and,  indeed,  this  is  the  notation  that  is 
commonly  used  for  p.  To  emphasize  the  multiplication,  we  might  write  p{x,  y)  =  x  ■  y.  Suppose  that 
X  and  Y  are  statements;  e.g.,  X  may  be  the  statement  "It  is  cloudy."  and  Y  may  be  "It  is  hot." 
Wc  can  build  a  more  complicated  statement  from  these  two  in  many  simple  ways.  One  possibility  is 

"It  is  cloudy  and  it  is  hot." 
We  could  abbreviate  this  to  "X  and  F."  Let  this  compound  statement  be  Z .  Let  .t  =  0  if  X  is  false 
and  x  =  1  if  X  is  true.  Define  y  and  z  similarly.  (This  is  the  True/False  interpretation  of  0  and  1 
mentioned  earlier.)  You  should  be  able  to  see  that  z  =  p{x,y)  because  Z  is  true  if  and  only  if  both 
X  and  Y  are  true.  Not  surprisingly,  the  function  p  is  called  and  in  logic.  Logicians  sometimes  write 
p{x,  y)  =  X  Ay  instead  of  p(x,  y)  =  xy. 

What  interpretation  does  the  function  s{x,  y)  have?  If  we  write  it  as  an  arithmetic  function  on 
integers,  it  is  a  bit  of  a  mess,  namely,  x  +  y  —  xy.  In  logic  it  has  a  very  simple  interpretation.  Using 
the  notation  of  the  previous  paragraph,  let  Z  be  the  statement  "X  or  Y."  Our  usual  understanding 
of  this  is  that  Z  is  true  if  at  least  one  of  X  and  Y  is  true.  This  understanding  is  equivalent  to  the 
mathematical  statement  s{x,y)  =  z  since  s  is  1  if  at  least  one  of  its  arguments  is  1.  Logicians  call 
this  function  or.  There  are  two  common  ways  to  denote  it:  s{x,y)  =  x  V  y  and  s{x,y)  =  x  +  y. 
The  X  +  y  notation  is  a  bit  unfortunate  because  it  does  not  mean  that  x  and  y  are  to  he  added  as 
integers.  Since  the  plus  sign  notation  is  commonly  used  in  discussion  of  circuits,  we  will  use  it  here. 
Remember: 

If  a;,  y  G  {0, 1},  then  x  +  y  means  OR,  not  addition. 
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How  can  our  third  function  /  be  interpreted?  In  terms  of  logic,  /(x,  y)  corresponds  to  a  statement 
which  is  true  when  exactly  one  of  X  and  Y  is  true.  Thus  /  is  called  the  exclusive  or  and  we  write 
it  as  f{x,y)  =  x  ®y.  This  function  does  not  appear  often  in  logic  but  it  is  important  in  binary 
arithmetic:  x®y\s  the  unit's  (rightmost)  digit  of  the  binary  sum  of  the  numbers  x  and  y.  (Recall 
that  the  binary  sum  of  1  and  1  is  10.) 

We  need  a  little  more  notation.  Negation,  n(x),  also  called  complementation,  is  an  important 
function.  It  is  n{x)  —  ( J  q);  i.e.,  n{x)  is  true  if  and  only  if  x  is  false.  In  terms  of  the  previous 
notation,  we  can  write  n{x)  =  a;  ©  1.  The  notations  x'  and  x  are  both  commonly  used  for  n{x).  We 
will  use  x' .  You  should  be  able  to  see  quickly  that  x"  =  a;.  Q 

Just  as  in  ordinary  algebra,  we  can  combine  the  functions  we  introduced  in  the  previous  example. 
Some  formulas  such  as  xy  =  yx  and  [x  +  y)z  =  xz  +  yz,  which  are  true  in  ordinary  arithmetic  are 
true  here,  too.  There  are  also  some  surprises  such  asxx  =  x  +  x  =  x  and  x  +  xy  =  x.  For  those 
who  like  terminology,  xx  =  x  and  x  +  x  =  x  are  called  the  idempotent  laws  for  multiplication  and 
addition  and  x  +  xy  =  x  is  called  the  absorbtion  law  for  addition.  How  can  we  check  to  see  if  an 
identity  for  Boolean  expressions  is  correct?  We  can  do  this  in  one  of  two  ways: 

(a)  (Brute  force)  By  substituting  all  possible  combinations  of  zeroes  and  ones  for  the  variables,  we 
may  be  able  to  prove  that  both  sides  of  the  equal  sign  always  take  on  identical  values. 

(b)  (Algebra)  By  manipulation  using  known  identities,  we  may  be  able  to  prove  that  both  sides  of 
the  equal  sign  are  equal  to  the  same  expression. 

The  brute  force  method  is  guaranteed  to  always  work,  but  the  algebraic  method  is  often  quicker  if 
you  can  see  how  to  make  the  algebra  work  out.  Let's  see  how  these  methods  work. 

To  verify  x  +  x  =  x  hy  brute  force,  note  first  that  when  x  =  0  the  left  side  is  0  +  0,  which  is  0, 
and  second  that  when  x  =  1  the  left  side  is  1  +  1,  which  is  1. 

We  now  use  algebra  to  verify  {x  +  y){x  +  z)  =  x  +  yz  using  the  identities  we  mentioned  above. 
We  leave  it  to  you  to  indicate  which  of  the  above  identities  is  used  at  each  step. 

{x  +  y){x  +  z)  =  x{x  +  z)  +  y{x  +  z) 
=  (a;  +  z)x  +{x  +  z)y 

=  XX  +  zx  +  xy  +  zy 
=  X  +  xz  +  xy  +  yz 
=  X  +  xy  +  yz 
=  x  +  yz. 

Of  course,  one  would  not  normally  write  out  all  the  steps.  It  would  probably  be  shortened  to 

{x  +  y){x  +  z)  =  x  +  xy  +  xz  +  yz  =  x  +  yz, 
just  as  we  take  shortcuts  in  ordinary  algebra. 
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Example  2.18  Truth  tables  Two-line  notation  is  a  convenient  way  of  proving  identities  by 
brute  force  when  we  modify  it  slightly:  Instead  of  listing  one  function  with  domain  {0, 1}"  and 
codomain  {0, 1},  we  list  several.  This  gives  a  way  of  building  up  various  expressions.  Truth  tables 
are  a  modification  of  this:  rows  and  columns  are  interchanged.  Here  is  the  truth  table  for  the  basic 
operations  we  introduced  earlier. 


X  y 

xy 

X  +  y 

X  ©  y 

x' 

0 

0 

0 

0 

0 

1 

0 

1 

0 

1 

1 

1 

1 

0 

0 

1 

1 

0 

1 

1 

1 

1 

0 

0 

Let's  use  truth  tables  to  prove  (xy)'  =  x'  +  y' .  Here's  the  complete  proof  (the  {xy)'  and  x'  +  y' 
columns  are  equal): 


X 

y 

xy 

(xy)' 

x' 

y' 

x'  +  y' 

0 

0 

0 

1 

1 

1 

1 

0 

1 

0 

1 

1 

0 

1 

1 

0 

0 

1 

0 

1 

1 

1 

1 

1 

0 

0 

0 

0 

□ 


Two  important  identities  that  we  have  not  mentioned  in  full  generality  are  DeMorgan's  laws: 

[x-y  z)'  =  x' +  y' +  ---  +  Z',  2.3 

{x  +  y  +  ---  +  zy  =  x' -y'  z'.  2.4 

We  prove  the  first  by  a  modification  of  brute  force  and  leave  the  second  for  you.  The  product  on  the 
left  side  of  (2.3)  is  1  if  and  only  iix  =  y  =  --  -  =  z  =  l.  Thus  the  left  side  of  (2.3)  is  0  if  and  only 
i{x  =  y  =  --  -  =  z  =  l.  The  right  side  of  (2.3)  is  0  if  and  only  if  x'  =  y'  =  ■  ■  ■  =  z'  =  0,  which  is 
equivalent  tox  =  y=  --  -  =  z  =  l.  Thus,  each  side  of  (2.3)  is  0  precisely  when  x  =  y=  --  -  =  z  =  l. 
Consequently  both  sides  are  1  for  all  other  values  of  x,y, . . . ,  z.  This  completes  the  proof. 

Example  2.19  An  adder  One  of  the  basic  operations  in  a  computer  is  addition.  We  can 
imagine  designing  a  device  to  add  two  numbers  the  way  wo  learned  to  do  it  in  school:  First  add 
the  digits  in  the  unit's  position,  record  the  unit's  digit  of  the  sum  and  remember  a  carry  digit  (no 
carry  is  a  carry  digit  of  zero).  Second,  add  the  digits  in  the  ten's  position  and  the  carry  digit,  record 
the  unit's  digit  of  the  sum  as  the  ton's  digit  of  the  answer  and  remember  a  carry  digit.  And  so  on. 
Of  course,  in  a  computer  wc  work  with  binary  instead  of  base  ten.  For  one  possible  design  of  an 
adder,  we  need  a  device  that  takes  in  three  binary  digits  and  produces  a  sum  unit's  digit  and  a  carry 
digit.  Represent  such  a  device  by  a  black  box  with  inputs  on  the  left,  and  outputs  on  the  right  so 
that  the  top  output  is  the  sum  unit's  digit  and  the  bottom  output  is  the  carry  digit.  To  add  two 
k  digit  binary  numbers,  we  can  combine  k  of  these  boxes;  for  example,  here's  how  we  can  add  the 
two  3  digit  binary  numbers  a2aiao  and  62^1^0  to  get  a  sum  S3S2S1S0: 


We  need  to  specify  the  outputs  of  our  black  box  as  Boolean  functions  of  the  inputs.  One  way 
to  do  this  is  with  a  truth  table.  Another  way  to  do  it  is  by  writing  down  the  equations.  They  are 
shown  in  Figure  2.1,  with  x,  y  and  z  as  inputs  and  s  and  c  as  the  sum  and  carry,  respectively.  (You 
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z 

s 
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A 
U 
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n 
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i 
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U 

-1 
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1 

1 
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1 
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0 
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0 

0 

1 

1 

1 

1 

1 

1 

s  =  x®y®z 

=  x'y'  z  +  x'yz'  +  xy'  z'  +  xyz 


c  =  xy  +  xz  -\-  yz 


Figure  2.1  The  truth  table  and  algebraic  representations  of  a  three  bit  binary  adder.  The  sum  of  the  bits 
X,  y  and  z  is  the  two  bit  binary  number  cs. 


should  be  able  to  derive  the  truth  table  from  the  definition  of  s  and  c,  but  you  may  not  see  how  to 
write  down  the  algebraic  equations.) 

What  form  should  we  use — truth  table  or  algebraic  equations?  The  answer  depends  on  what  we 
plan  to  do  with  the  function.  If  we  plan  to  manipulate  it,  then  an  algebraic  form  is  usually  best.  If 
we  plan  to  use  it  in  some  computer  aided  circuit  design  program,  then  whatever  form  the  program 
requires  is  best.  Q 

How  can  we  convert  a  truth  table  to  an  algebraic  expression?  One  method  is  disjunctive  normal 
form.  A  Boolean  function  is  in  disjunctive  normal  form  if  it  is  a  sum  (OR)  of  products  (ANDs)  with 

each  factor  in  each  product  being  either  a  variable  or  its  complement.  For  example,  the  functions 
x'y' z  +  x'yz'  +  xy' z'  +  xyz  and  xy  +  xz  +  yz  in  Figure  2.1  are  in  disjunctive  normal  form,  but  x(By®z 
is  not. 

In  Example  7.3  (p.  200)  we'll  prove  by  induction  that  every  Boolean  fimction  can  be  written  in 
disjunctive  normal  form.  We  now  give  a  different  proof  by  showing  how  to  convert  the  truth  table 
for  a  function  into  disjunctive  normal  form.  Let  the  function  be  f{xi,X2,  ■  ■  ■  ,a;„)  Our  disjunctive 
normal  form  will  contain  one  term  for  each  row  of  the  truth  table  in  which  /  =  1.  Each  term  will  be 
a  product  of  n  factors.  If  the  row  says  Xi  =  1,  then  Xi  appears  in  the  product;  otherwise,  x'^  appears 
in  the  product.  For  example,  look  at  the  function  s  in  Figure  2.1.  There  are  four  rows  in  the  truth 
table  with  s  =  1.  The  first  such  row  has  x  =  y  =  0  and  z  =  1,  which  gives  us  the  first  term,  x'y' z, 
in  our  disjunctive  normal  form  for  s. 

It  is  rather  easy  to  prove  that  the  truth  table,  T',  of  the  resulting  disjunctive  normal  form,  /, 
equals  the  truth  table,  T,  it  was  derived  from.  A  row  of  T'  will  be  1  if  and  only  if  /  is  1,  which 
happens  if  and  only  if  some  term  of  /  is  1.  On  the  other  hand,  a  term  of  /  is  1  if  and  only  if  xi, . . . ,  a;„ 
take  on  the  values  that  led  to  that  term.  Those  values  were  associated  with  a  row  in  T  for  which 
the  function  is  1. 

Let's  return  to  Figure  2.1  and  look  at  a  disjunctive  normal  form  for  c.  From  the  truth  table  we 
have 

c  =  x'yz  +  xy'  z  +  xyz'  +  xyz,  2.5 

which  is  not  the  same  as  the  disjunctive  normal  form  c  =  xy  +  xz  +  yz  given  earlier.  How  can  this 
be?  There  need  not  be  a  unique  disjunctive  normal  form  for  a  function.  Is  one  form  better  than  the 
other?  Yes,  since  it  has  fewer  and  simpler  terms  than  (2.5),  c  =  xy  +  xz  +  yz  requires  less  hardware 
to  implement  as  a  disjunctive  normal  form  than  does  (2.5).  How  can  we  find  the  "best"  disjunctive 
normal  form?  This  is  a  hard  question — even  defining  "best"  may  be  difficult. 

There  are  other  algebraic  forms  besides  disjunctive  normal  form  which  are  important.  With 
current  computer  chip  technology,  the  most  important  is  "NAND,"  which  is  defined  by 

NAND{xi,X2,  ■  ■  ■  ,Xn)  =  {xiX2---Xny,    H  >  1; 
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that  is,  "NOT  the  AND  of  the  Xi's."  "NAND"  is  a  contraction  of  "NOT  AND" .  By  one  of  DeMorgan's 
laws, 

NAND(a;i,a;2,...,a;„)  =  x[+X2-\  \- x'^.  2.6 

It  is  a  simple  matter  to  prove  that  every  Boolean  function  can  be  built  up  out  of  NANDs:  We  will 
do  this  by  proving  that  every  disjunctive  normal  form  can  be  implemented  by  using  NANDs.  Let 

the  disjunctive  normal  form  he  f  =  pi  +  p2  +  hpm,  where  pi  —  Ui^iUi^2  ■  ■  ■  Ui,ki  and  each  u  is  an 

input  variable  or  its  complement.  To  complete  the  proof  it  suffices  to  note  that 

/  =  NAND(yi, . . .  ,p'J  by  (2.6)  and  p"  =  p; 

p\  =  NAND(Mi,i, . . . ,  Mi,feJ  by  the  definition  of  NAND; 

x\  =  NAND(.Tt). 

Thus  /  is  a  NAND  of  NANDs  of  Uj,j's  and  each  Uij  is  a  variable  or  its  NAND.  For  example,  from 
Figure  2.1  we  have 

c  =  xy  +  xz  +  yz  =  NAND  ^NAND(a;,  y),  NAND (x,  z),  NAND (y,  z)^ 

s  =  x'y'z  +  ---  =  NAND  ^NAND  (NAND  (a;),  NAND  (?/),^;),...^. 

Does  this  direct  translation  of  disjunctive  normal  form  give  us  the  simplest  representation  by 
NANDs?  (Of  course,  there  may  be  several  disjunctive  normal  forms,  so  we  would  presumably  start 
with  the  "simplest.")  As  you  may  suspect,  the  answer  is  "No."  In  fact,  the  question  is  not  even 
well  posed.  One  issue  that  arises  immediately  is  what  about  repeated  expressions;  e.g.,  if  x[  appears 
several  times,  must  we  compute  NAND(.xi)  several  times,  or  may  we  just  compute  it  once  and  reuse 
it?  We  may  usually  reuse  it,  but  there  are  sometimes  hardware  constraints  that  do  not  allow  us 
to.  This  is  much  like  the  problem  of  computing  any  complicated  algebraic  function  in  which  some 
subexpression  appears  more  than  once:  If  you  are  clever,  you  will  only  compute  that  subexpression 
once.  We'll  not  pursue  the  complex  problem  of  representation  by  NANDs. 

Exercises 


2.4.1.  Using  only  the  identities  in  the  text,  prove  x{x  +  y)  =  x  hy  algebra.  (This  is  the  absorbtion  law  for 

multiplication.) 

2.4.2.  Prove  that  x®y  =  x'  ®y' . 

2.4.3.  The  distributive  law  for  multiplication,  ■,  over  addition,  +,  states  that  x  ■  {y  +  z)  =  x  ■  y  +  x  ■  z.  State 
the  following  distributive  laws.  If  the  law  is  true,  prove  it.  If  the  law  is  false,  give  a  counterexample. 
Hint.  Truth  tables  can  be  used  to  find  proofs  and  counterexamples. 

(a)  addition,  +,  over  multiplication,  •; 

(b)  multiplication,  •,  over  exclusive  or,  ®; 

(c)  addition,  +,  over  exclusive  or,  ®; 

(d)  exclusive  or,  ®  over  multiplication,  •. 

2.4.4.  Define  NOR(a;,j/,  ■  ■  ■  ,z)  =  {x  +  y  +  ■  ■  ■  +  z)' .  Prove  that  every  Boolean  function  can  be  expressed 
using  NORs. 

2.4.5.  Write  each  of  the  following  in  disjunctive  normal  form.  Try  to  obtain  as  simple  a  disjunctive  normal 
form  as  possible. 

(a)  {x®y)ix  +  y). 

(b)  {x  +  y)®z. 

(c)  {x  +  y  +  z)®z. 

(d)  {xy)®z. 
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2.4.6.  Let  f{x,  y,  z)  equal  z  unless  x  =  y,'m  which  case  f{x,  y,  z)  =  x.  Write  /(x,  y,  z)  in  disjunctive  normal 
form. 

2.4.7.  Let  f{x,y,z,w)  equal  w  unless  x  =  y  =  z,  in  which  case  f{x,y,z,w)  =  x.  Write  f{x,y^z,w)  in 
disjunctive  normal  form.  Try  to  obtain  as  simple  a  form  as  you  can. 

2.4.8.  Let  /(.T,  y,  z,  w)  equal  zw  if  x  =  y  and  z  +  w  otherwise.  Write  f{x,  y,  z,  w)  in  disjunctive  normal  form. 
Try  to  obtain  as  simple  a  form  as  you  can. 


Notes  and  References 


Further  discussion  of  Boolean  functions  and  circuit  design  at  the  level  of  this  text  can  be  found 

in  the  texts  by  Gill  [1]  and  McElicce  ct  al.  [2]. 

1.  Arthur  Gill,  Applied  Algebra  for  the  Computer  Sciences,  Prcnticc-Hall  (1976). 

2.  Robert  J.  McEliece,  Robert  B.  Ash  and  Carol  Ash,  Introduction  to  Discrete  Mathematics,  Ran- 
dom House  (1989). 


CHAPTER  3 


Decision  Trees 


Introduction 


In  many  situations  one  needs  to  make  a  series  of  decisions.  This  leads  naturally  to  a  structure  called 

a  "decision  tree,"  which  wc  will  define  shortly.  Decision  trees  provide  a  geometrical  framework  for 
organizing  the  decisions.  The  important  aspect  is  the  decisions  that  are  made.  Everything  we  do  in 
this  chapter  could  be  rewritten  to  avoid  the  use  of  trees;  however,  trees 

•  give  us  a  powerful  intuitive  basis  for  viewing  the  problems  of  this  chapter, 

•  provide  a  language  for  discussing  the  material, 

•  allow  us  to  view  the  collection  of  all  decisions  in  an  organized  manner. 

We'll  begin  by  studying  some  basic  concepts  concerning  decision  trees.  Next  we'll  relate  decision 
trees  to  the  concepts  of  "ranking"  and  "unranking,"  ideas  which  allow  for  efficient  computer  storage 
of  data  which  is  a  priori  not  indexed  in  a  simple  manner.  Finally  we'll  study  decision  trees  which 
contain  some  bad  decisions;  i.e.,  decisions  that  do  not  lead  to  solutions. 

Decision  trees  are  particularly  useful  in  the  "local"  study  of  recursive  procedures.  In  Sections  7.3 
(p.  210)  and  9.3  (p.  259)  we'll  apply  this  idea  to  various  algorithms. 

Make  sure  you're  familiar  with  the  concepts  in  Chapter  2,  especially  the  first  section,  the  defi- 
nition of  a  permutation  and  the  bijections  in  Theorem  2.4  (p.  52). 


3.1   Basic  Concepts  of  Decision  Trees 


Decision  trees  provide  a  method  for  systematically  listing  a  variety  of  functions.  We'll  begin  by 
looking  at  a  couple  of  these.  The  simplest  general  class  of  functions  to  list  is  the  entire  set  n-.  We 
can  create  a  typical  element  in  the  list  by  choosing  an  element  of  n  and  writing  it  down,  choosing 
another  element  (possibly  the  same  as  before)  of  n  and  writing  it  down  next,  and  so  on  until  we 
have  made  k  decisions.  This  generates  a  function  in  one  line  form  sequentially:  First  /(I)  is  chosen, 
then  /(2)  is  chosen  and  so  on.  We  can  represent  all  possible  decisions  pictorially  by  writing  down 
the  decisions  made  so  far  and  then  some  downward  edges  indicating  the  possible  choices  for  the  next 
decision. 

The  lefthand  part  of  Figure  3.1  illustrates  this  way  of  generating  a  function  in  2-  sequentially. 
It's  called  the  decision  tree  for  generating  the  functions  in  2-.  Each  line  in  the  left  hand  figure  is 
labeled  with  the  choice  of  function  value  to  which  it  corresponds.  Note  that  the  labeling  does  not 
completely  describe  the  corresponding  decision — we  should  have  used  something  like  "Choose  1  for 
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111  112  121  122  211  212  221  222 


Figure  3.1  (a)  The  decision  tree  for  all  functions  in  2-;  that  is,  3-long  lists  from  {1,2}  with  repeats 
allowed  is  on  the  left.  We've  omitted  commas  in  the  lists.  Functions  are  written  in  one  line  notation,  (b)  The 
underlying  tree  with  all  labels  removed  is  on  the  right. 


the  value  of  /(2)"  instead  of  simply  "1"  on  the  line  from  1  to  11.  At  the  end  of  each  line  is  the 
function  that  has  been  built  up  so  far.  We've  omitted  commas,  so  211  means  2,1,1. 

To  the  right  of  the  figure  for  generating  the  functions  is  the  same  structure  without  any  labels. 
The  dots  (•)  are  called  nodes  or  vertices  and  the  lines  connecting  them  are  called  edges.  Sometimes 
it  is  more  convenient  to  speciiy  an  edge  by  giving  its  two  end  points;  for  example,  the  edge  from  1 
to  12  in  the  figure  can  be  described  uniquely  as  (1,12).  The  nodes  with  no  edges  leading  down  are 
called  the  leaves.  The  entire  branching  structure  is  called  a  tree  or,  more  properly,  an  ordered  rooted 
tree.  The  topmost  node  is  called  the  root. 

In  this  terminology,  the  labels  on  the  vertices  show  the  partial  function  constructed  so  far  when 
the  vertex  has  been  reached.  The  edges  leading  out  of  a  vertex  are  labeled  with  all  possible  decisions 
that  can  be  made  next,  given  the  partial  function  at  the  vertex.  We  labeled  the  edges  so  that  the 
labels  on  edges  out  of  each  vertex  are  in  order  when  read  left  to  right.  The  leaves  are  labeled  with 
the  finished  functions.  Notice  that  the  labels  on  the  leaves  are  in  lexicographic  order.  If  we  agree 
to  label  the  edges  from  each  vertex  in  order,  then  any  set  of  functions  generated  sequentially  by 
specifying  f{i)  at  the  ith.  step  will  be  in  lex  order. 

To  create  a  single  function  we  start  at  the  root  and  choose  downward  edges  (i.e.,  make  decisions) 
until  we  reach  a  leaf.  This  creates  a  path  from  the  root  to  a  leaf.  We  may  describe  a  path  in  any  of 
the  following  ways: 

•  the  sequence  of  vertices  vo,vi, . . .  ,Vm  on  the  path  from  the  root  vq  to  the  leaf  Vm', 

•  the  sequence  of  edges  ei,  62, . . . ,  em,  where  e,  =  (t;,_i,  u,),  the  edge  connecting  Vi-i  to  w,; 

•  the  integer  sequence  of  decisions  Di,D2,  ■ .  ■ ,  Dm,  where  e,  has  Di  edges  to  the  left  of  it  leading 
out  from  Vi-i. 

Note  that  if  a  vertex  has  k  edges  leading  out  from  it,  the  decisions  are  numbered  0,  1,  k  —  1.  We 
will  find  the  concept  of  a  sequence  of  decisions  useful  in  stating  our  algorithms  in  the  next  section. 
We  illustrate  the  three  approaches  by  looking  at  the  leaf  2,1,2  in  Figure  3.1: 

•  the  vertex  sequence  is  root,  2,  21,  212; 

•  the  edge  sequence  is  2,  1,  2; 

•  the  decision  sequence  is  1,  0,  1. 

The  following  example  uses  a  decision  tree  to  list  a  set  of  patterns  which  are  then  used  to  solve 
a  counting  problem. 
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ccvcc 


ccvcv 


cvcc 

I 

cvccv 


cvcv 

I 

cvcvc 


vcvc 


vcvcc 


vcvcv 


Figure  3.2    The  decision  tree  for  5-long  C-V  patterns. 


#V's  #CC's 

ways  to  fill 

patterns 

1  2 

2  0 

2  1 

3  0 

(20  X  19)2  X  6 
20^  X  6^ 

(20  X  19)  X  20  X  6^ 
20^  X  6^ 

CCVCC 

cvcvc 

ccvcv  cvccv  vccvc  vcvcc 
vcvcv 

Figure  3.3   Grouping  and  counting  the  patterns. 


Example  3.1  Counting  words  Using  the  26  letters  of  the  alphabet  and  considering  the  letters 
AEIOUY  to  be  vowels  how  many  five  letter  "words"  (i.e.  five  long  lists  of  letters)  are  there  subject 
to  the  following  constraints? 

(a)  No  vowels  are  ever  adjacent. 

(b)  There  are  never  three  consonants  adjacent. 

(c)  Adjacent  consonants  are  always  different. 

To  start  with,  it  would  be  useful  to  have  a  list  of  all  the  possible  patterns  of  consonants  and 
vowels;  e.g.,  CCVCV  (with  C  for  consonant  and  V  for  vowel)  is  possible  but  CVVCV  and  CCCVC 
violate  conditions  (a)  and  (b)  respectively  and  so  are  not  possible.  We'll  use  a  decision  tree  to 
generate  these  patterns  in  lex  order.  Of  course,  a  pattern  CVCCV  can  be  thought  of  as  a  function 
/  where  /(I)  =  C,  /(2)  =  V,  . . .,  /(5)  =  V.  In  Example  3.2  we'll  use  a  decision  tree  for  a  counting 
problem  in  which  there  is  not  such  a  straightforward  function  interpretation. 

We  could  simply  try  to  list  the  patterns  (functions)  directly  without  using  a  decision  tree.  The 
decision  tree  approach  is  preferable  because  we  are  less  likely  to  overlook  something.  The  resulting 
tree  is  shown  in  Figure  3.2.  At  each  vertex  there  are  potentially  two  choices,  but  at  some  vertex  only 
one  is  possible  because  of  conditions  (a)  and  (b)  above.  Thus  there  are  one  or  two  decisions  at  each 
vertex.  You  should  verify  that  this  tree  lists  all  possibilities  systematically. 

*        *        *       Stop  and  think  about  this!        *        *  * 

With  enough  experience,  one  can  list  the  leaves  of  a  fairly  simple  decision  tree  like  this  one  without 
needing  to  draw  the  entire  tree.  It's  easier  to  make  mistakes  using  that  short  cut,  so  we  don't 
recommend  it — especially  on  exams! 

We  can  now  group  the  patterns  according  to  how  many  vowels  and  how  many  adjacent  con- 
sonants they  have  since  these  two  pieces  of  information  are  enough  to  determine  how  many  ways 
the  pattern  can  be  filled  in.  The  results  are  given  in  Figure  3.3.  To  get  the  answer,  we  multiply  the 
"ways  to  fill"  in  each  row  by  the  number  of  patterns  in  that  row  and  sum  over  all  four  rows.  The 
answer  is  20^  x  6  x  973.  □ 
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Figure  3.4   A  decision  tree  for  two  2  pair  hands. 


The  next  two  examples  also  deal  with  patterns.  The  first  example  looks  at  patterns  of  overlap 
between  two  hands  of  cards  and  uses  this  to  count  hands.  In  contrast  to  the  previous  example,  no 

locations  are  involved.  The  second  example  looks  at  another  word  problem.  We  take  the  patterns  to 
be  the  number  of  repetitions  of  letters  rather  than  the  location  of  types  of  letters. 

Example  3.2  Two  hands  of  cards  In  Chapter  1  we  studied  various  problems  involving  a  hand 
of  cards.  Now  we  complicate  matters  by  forming  more  than  one  hand  on  the  same  deal  from  the 
deck.  How  many  ways  can  two  5  card  hands  be  formed  so  that  each  hand  contains  2  pairs  (and  a 
fifth  card  that  has  a  different  value)? 

The  problem  can  be  solved  by  forming  the  first  hand  and  then  forming  the  second,  since  the 
number  of  choices  for  the  second  hand  does  not  depend  on  what  the  first  hand  is  as  long  as  we  know 
it  is  a  hand  with  2  pairs.  We  solved  the  two  pair  problem  in  Example  1.16  (p.  22),  so  we  know  that 
the  first  hand  can  be  formed  in  123,552  ways. 

Forming  the  second  hand  is  complicated  by  the  fact  that  the  first  hand  has  used  up  some  of  the 
cards  in  the  deck.  As  a  resiilt,  we  must  consider  different  cases  according  to  the  amount  of  overlap 
between  the  first  and  second  hands.  We'll  organize  the  possibilities  by  using  a  decision  tree.  Let  Pi 
be  the  set  of  values  of  the  pairs  in  the  ithhand;  e.g.,  we  might  have  Pi  =  {3,  Q}  and  P2  =  {2,  3},  in 
which  case  the  hands  have  one  pair  value  in  common.  Our  first  decision  will  be  the  value  of  |Pi  n-P2|, 
which  must  be  0,  1  or  2.  Our  next  decision  will  be  based  on  whether  or  not  the  value  of  the  unpaired 
card  in  the  first  hand  is  in  P2;  i.e.,  whether  or  not  a  pair  in  the  second  hand  has  the  same  value  as 
the  nonpair  card  in  the  first  hand.  We'll  label  the  edges  Y  and  N  according  as  this  is  true  or  not. 
The  decision  tree  is  shown  in  Figure  3.4,  where  we've  labeled  the  leaves  A-E  for  convenience. 

We'll  prove  that  the  number  of  hands  that  correspond  to  each  of  the  leaves  is 

A 
B 

C 

D 
E 

giving  a  total  of  75,595  choices  for  the  second  hand.  Multiplying  this  by  123,552  (the  number  of 
ways  of  forming  the  first  hand)  we  find  that  there  are  somewhat  more  than  9  x  10^  possibilities  for 
two  hands.  Of  course,  if  the  order  of  the  hands  is  irrelevant,  this  must  be  divided  by  2. 

We'll  derive  the  values  for  leaves  A  and  D  and  let  you  do  the  rest.  For  A,  we  choose  the  two 

pairs  not  using  any  of  the  three  values  in  the  first  hand.  As  in  Example  1.16,  this  gives  us  (2")  (2) 

possibilities.  We  then  choose  the  single  card.  To  do  the  latter,  wc  must  avoid  the  values  of  the  two 
pairs  already  chosen  in  the  second  hand  and  the  cards  in  the  first  hand.  Thus  there  are  52  —  8  —  5 
choices  for  this  card.  For  D,  we  choose  which  element  of  Pi  is  to  be  in  P2  (2  choices).  The  suits 


10 
2 
3 

2  X 


(52  -  8  -  5)  =  63,180, 

(52  -  8  -  4)  =  7,200, 
2]  (52  -  8  -  3)  =  4,920, 


(52  -  8  -  2)  =  252, 


52  -  8  -  1  =  43, 
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number  of  triples 


number  of  doubles 


number  of  singles 


A     B     C     D     E     F  G 


Figure  3.5  The  decision  tree  for  forming  8-letter  words  using  ERRONEOUSNESS  in  Example  3.3.  The 
first  level  of  edges  from  the  root  indicates  the  number  of  different  letters  used  three  times;  the  next  level, 
two  times;  and  the  bottom  level,  once. 


of  that  pair  in  the  second  hand  are  determined  since  only  two  cards  with  that  value  are  left  in  the 
deck  after  selecting  the  first  hand.  Next  we  choose  the  suits  for  the  other  pair  ((2)  choices).  Finally 
we  select  the  fifth  card.  We  must  avoid  (a)  the  values  of  the  two  pairs  already  chosen  in  the  second 
hand  and  (b)  the  pair  that  was  used  in  the  first  hand  which  doesn't  have  the  same  value  as  either 
of  the  pairs  in  the  second  hand.  Q 

Example  3.3  Words  from  a  collection  of  letters  Review  the  problem  discussed  in  Exam- 
ples 1.11  (p.  13)  and  1.19  (p.  24),  where  we  counted  possible  "words"  that  could  be  made  using  the 
letters  in  ERROR.  In  Example  1.11,  we  counted  by  listing  words  one  at  a  time.  In  Example  1.19,  we 
counted  by  grouping  words  according  to  the  number  of  occurrences  of  each  letter.  Both  approaches 
could  be  carried  out  using  a  decision  tree,  for  example,  we  could  first  choose  toi,  then  m2  and  fi- 
nally 7713  in  Example  1.19.  Let's  consider  a  bigger  problem  where  it  will  be  useful  to  form  even  larger 
groups  of  words  than  in  Example  1.19.  (Larger  groups  result  in  fewer  total  groups  and  so  lead  to 
easier  counting.)  We'll  group  words  by  patterns,  as  we  did  in  Example  3.1,  but  the  nature  of  the 
patterns  will  be  different — it  will  be  more  like  what  we  did  for  hands  of  cards  in  Chapter  1.  Let's 
compute  the  number  of  8-letter  words  that  can  be  can  be  formed  using  the  letters  in  ERRONEOUS- 
NESS, using  a  letter  no  more  often  than  it  appears  in  ERRONEOUSNESS.  We  have  three  each  of 
E  and  S;  two  each  of  O,  N,  and  R;  and  one  U.  We  will  make  three  levels  of  decisions.  The  first  level 
will  be  how  many  letters  to  use  three  times,  the  second  level  how  many  to  use  twice,  and  the  third 
level  how  many  to  use  once.  Of  course,  since  the  total  number  of  letters  used  must  be  8  to  create  our 
8-letter  word,  the  choice  on  the  third  level  is  forced.  Since  there  are  only  6  distinct  letters,  we  must 
either  use  at  least  one  letter  3  times  or  use  at  least  two  letters  twice.  The  decision  tree  is  shown  in 
Figure  3.5. 

Why  did  we  go  from  three  times  to  twice  to  once  in  the  levels  instead  of,  for  example  in  the 
reverse  order? 

*       *       *       Stop  and  think  about  this!        *       *  * 

Had  we  first  chosen  how  many  letters  were  to  appear  once,  the  number  of  ways  to  choose  letters 
to  appear  twice  would  depend  on  whether  or  not  we  had  used  U.  The  choices  for  triples  would  be 
even  more  complex.  By  starting  with  the  most  repeated  letters  and  working  down  to  the  unrepealed 
ones,  this  problem  does  not  arise.  Had  we  made  decisions  in  the  reverse  order  just  described,  there 
would  have  been  23  leaves.  You  might  like  to  try  constructing  the  tree. 

We  can  easily  carry  out  the  calculations  for  each  of  eight  leaves.  For  example,  at  leaf  F,  we 
choose  1  of  the  2  possibilities  at  the  first  choice  AND  2  of  the  4  possibilities  at  the  second  choice 
AND  1  of  the  3  possibilities  at  the  third  choice.  This  gives  (J)  (*)  (^)  •  Listing  the  choices  at  each 
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time  in  alphabetical  order,  they  might  have  been  E;  N,  S;  O.  AND  now  we  choose  locations  for  each 
of  these  letters.  This  can  be  counted  by  a  multinomial  coefficient:  (3  ^).  In  a  similar  manner,  we 
have  the  following  results  for  the  eight  cases. 

-  (o)©(:)g,.m,m)~ 

-  G)G)C)G,.^.)-^'- 

-  G)G)G)G,.m,m)--- 

-  G)(o)G)G,3^,0-■™ 

-  G)G)G)aL)-'- 

Adding  these  up,  we  get  454,440.  This  is  about  as  simple  as  we  can  make  the  problem  with  the  tools 
presently  at  our  disposal.  In  Example  11.6  (p.  319),  we'll  see  how  to  immediately  identify  the  answer 
as  the  coefficient  of  a  term  in  the  product  of  some  polynomials.  Q 

In  the  previous  examples  we  were  interested  in  counting  the  number  of  objects  of  some  sort — 
words,  hands  of  cards.  Decision  trees  can  also  be  used  to  provide  an  orderly  listing  of  objects.  The 
listing  is  simply  the  leaves  of  the  tree  read  from  left  to  right.  Figure  3.6  shows  the  decision  tree  for  the 
permutations  of  3  written  in  one  line  form.  Recall  that  we  can  think  of  a  permutation  as  a  bijection 
/  :  3  ^  3  and  its  one  line  form  is  /(I),  /(2),  /(3).  Since  we  chose  the  values  of  /(I),  /(2)  and  /(3) 
in  that  order,  the  permuations  are  listed  lexicographically;  that  is,  in  "alphabetical"  order  like  a 
dictionary  only  with  numbers  instead  of  letters.  On  the  right  of  Figure  3.6,  we've  abbreviated  the 
decision  tree  a  bit  by  shrinking  the  edges  coming  from  vertices  with  only  one  decision  and  omitting 
labels  on  nonleaf  vertices.  As  you  can  see,  there  is  no  "correct"  way  to  label  a  decision  tree.  The 
intermediate  labels  are  simply  a  tool  to  help  you  correctly  list  the  desired  objects  (functions  in  this 
case)  at  the  leaves.  Sometimes  one  may  even  omit  the  function  at  the  leaf  and  simply  read  it  off  the 
tree  by  looking  at  the  labels  on  the  edges  or  vertices  associated  with  the  decisions  that  lead  from 
the  root  to  the  leaf.  An  example  of  such  a  tree  appears  in  Figure  3.16  (p.  90). 

Definition  3.1  Rank  The  reink  of  a  ieaf  is  the  number  of  leaves  to  its  left.  Thus,  if  there 
are  n  leaves  the  rank  is  an  integer  between  0  and  n  —  1,  inclusive. 

In  Figure  3.6,  the  leaf  1,2,3  has  rank  0  and  the  leaf  3,1,2  has  rank  4.  Rank  is  an  important  tool 
for  storing  information  about  objects  and  for  generating  objects  at  random.  See  the  next  section  for 
more  discussion. 
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123  132  213  231  312  321 

Figure  3.6  The  decision  tree  for  the  permutations  of  3  in  lex  order,  (a)  The  full  tree  is  on  the  left  side, 
(b)  An  abbreviated  tree  is  on  the  right.  We've  omitted  commas  and  written  permutations  in  one  line  form. 


Example  3.4  Direct  insertion  order  Another  way  to  create  a  permutation  is  by  direct  inser- 
tion. (Often  this  is  simply  called  "insertion.")  Suppose  that  we  have  an  ordered  list  of  k  items  into 
which  we  want  to  insert  a  new  item.  It  can  be  placed  in  any  of  fc  +  1  places:  namely,  at  the  end  of 
the  list  or  immediately  before  the  ith  item  where  1  <  i  <  fc.  By  starting  out  with  1,  choosing  one  of 
the  two  places  to  insert  2  in  this  list,  choosing  one  of  the  three  places  to  insert  3  in  the  new  list  and, 
finally,  choosing  one  of  the  four  places  to  insert  4  in  the  newest  list,  we  will  have  produced  a  per- 
mutation of  4.  To  do  this,  we  need  to  have  some  convention  as  to  how  the  places  for  insertion  are 
numbered  when  the  list  is  written  from  left  to  right.  The  obvious  choice  is  from  left  to  right;  how- 
ever, right  to  left  is  often  preferred  because  the  leftmost  leaf  is  then  12 ...  n  as  it  is  for  lex  order. 
We'll  use  the  obvious  choice  (left  to  right)  because  it  is  less  confusing. 

Here's  the  derivation  of  the  permutation  associated  with  the  insertions  1,  3  and  2. 


number  to  insert 

2          3            4  Answer 

partial  permutation 
positions  in  list 

l)()siti()ii  to  use 

1        21        213  2413 
1  2     1  2  3     1  2  3  4 

1              1  1 

Figure  3.7  is  the  decision  tree  for  generating  permutations  of  3  by  direct  insertion.  The  labels 
on  the  vertices  are,  of  course,  the  partial  permutations,  with  the  full  permutations  appearing  on  the 
leaves.  The  decision  labels  on  the  edges  arc  the  positions  in  which  to  insert  the  next  number.  To  the 
left  of  the  tree  we've  indicated  which  number  is  to  be  inserted.  This  isn't  really  necessary  since  the 
numbers  are  always  inserted  in  increasing  order  starting  with  2.  Notice  that  the  labels  on  the  leaves 
are  no  longer  in  lex  order  because  we  constructed  the  permutations  differently.  Had  we  labeled  the 
vertices  with  the  positions  used  for  insertion,  the  leaves  would  then  be  labeled  in  lex  order.  If  you 
don't  see  why  this  is  so,  label  the  vertices  of  the  decision  tree  with  the  insertion  positions. 

Unlike  lex  order,  it  is  not  immediately  clear  that  this  method  gives  all  permutations  exactly  once. 
We  now  prove  this  by  induction  on  n.  The  case  n  =  1  is  trivial  since  there  are  no  insertions.  Suppose 
n  >  1  and  that  oi, . . . ,  a„  is  a  permutation  of  1, . . . ,  n.  If  =  n,  then  the  last  insertion  must  have 
been  to  put  n  in  position  k  and  the  partial  permutation  of  1, . . .  ,n  —  1  before  this  insertion  was 
ai, . . . ,  Ofe_i,  ttfe+i, . . . ,  o„.  By  the  induction  assumption,  the  insertions  leading  to  this  permutation 
are  unique  and  so  we  are  done. 

Like  the  method  of  lex  order  generation,  the  method  of  direct  insertion  generation  can  be  used 
for  other  things  besides  permutations.  However,  direct  insertion  cannot  be  applied  as  widely  as  lex 
order.  Lex  order  generation  works  with  anything  that  can  be  thought  of  as  an  (ordered)  list,  but 
direct  insertion  requires  more  structure.  Q 
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insert  2  into 

insert  3  into      1/2     \ 3  1  / 

321    231    213    312    132  123 
Figure  3.7   The  decision  tree  for  the  permutations  of  3  in  direct  insertion  order.  We've  omitted  commas. 


Example  3.5  Transposition  order  Yet  another  way  to  create  a  permutation  is  by  transpo- 
sition. It  is  similar  to  direct  insertion;  however,  instead  of  pushing  a  number  into  a  space  between 

elements  of  the  partial  permutation,  we  put  it  into  a  place  that  may  already  be  occupied,  bumping 
that  element  out  of  the  way.  If  an  element  is  bumped,  it  is  moved  to  the  end  of  the  list. 

Here's  the  derivation  of  the  permutation  associated  with  1,  3  and  2.  Note  that,  while  direct 
insertion  used  positions  between  elements  of  the  partial  permutation,  transposition  uses  the  positions 
of  the  partial  permutation  plus  one  more  space  at  the  end. 


number  to  insert 

2         3            4  Answer 

partial  permutation 
positions  in  list 
position  to  use 

1        2  1        2  1  3        2  4  3  1 
1  2     1  2  3     1  2  3  4 

T          T  T 

As  with  direct  insertion,  unique  generation  of  all  permutations  is  not  obvious.  A  similar  inductive 
proof  works:  Suppose  the  permutation  is  ai , . . . ,  a„  and  Uk  =  n.  The  preceding  partial  permutation  is 

•  ai, . . .  ,a„_i,  if  A;  =  n, 

•  oi, . . . ,  Ofc-i,  a„,  Ofc+i, . . . ,  a„_i,  if  A;  <  n. 

By  the  induction  assumption,  wc  can  generate  any  (n  —  l)-long  permutation,  so  generate  whichever 
of  the  above  permutations  of  {1, . . . ,  n  —  1}  is  required  and  then  put  n  into  position  k,  bumping  if 
necessary.  Thus  the  map  from  the  n\  possible  transposition  sequences  to  permutations  is  a  surjection. 
Since  the  domain  and  image  of  the  map  are  the  same  size,  the  map  is  a  bijection.  This  completes 
the  proof. 

Why  is  this  called  the  transposition  method?  Suppose  the  sequence  of  positions  is  p2,  ■■■  ,Pn- 
Then  the  permutation  is  obtained  by  starting  with  the  identity  permutation  and  applying,  in  order, 
the  transpositions  (2,p2),  (3,P3),  (n,p„),  where  the  pseudo-transposition  {k,k)  is  interpreted 
as  doing  nothing. 

Since  this  seems  more  complicated  than  lex  order  and  insertion  order,  why  would  anyone  use 
it?  It  is  an  easy  way  the  generate  random  permutations:  Generate  a  sequence  p2,  ■  ■  ■  ,Pn  of  random 
integers  where  1  <  Pk  <  k.  Then  apply  the  method  of  the  previous  paragraph  to  create  the  random 
permutation.  Q 
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Example  3.6  Programs  to  list  the  leaves  A  decision  tree  consists  of  a  set  of  sequential 
decisions,  and  so  the  code  fragment  discussed  in  connection  with  the  Rule  of  Product  is  relevant. 
Here  it  is  in  decision  tree  language: 

For  each  first  decision  di 

For  each  fcth  decision  dk 

List  the  structure  arising  from  the  decisions  di,...,dk- 
End  for 

End  for 

We  now  give  the  code  to  generate  the  leaves  of  Figure  3.2.  In  Create  (a, b),  we  find  the  next 
possible  decisions  given  the  two  preceding  ones,  a  followed  by  b.  Note  that,  if  di  is  the  first  decision, 
Create(V,rfi)  generates  the  correct  set  of  possible  second  decisions. 

Function  Create (a, b)  /*  Create  next  set  of  decisions.  */ 

If  6  =  V 

Return  {C}  /*  Avoid  VV  pattern.  */ 

Elseif  a  =  C 

Return  {V}  /*  Avoid   CCC  pattern.  */ 

Else 

Return  {C,V} 
End  if 

End 


Procedure  ListCV 

Ci  =  {C,V} 
For  di  G  Di : 

-D2  =  Create ( V,di)  /*   V  is  fake  zeroth  decision.  */ 

For  d2  G  -D2 : 

D3  =  Create  (^1,^2) 
For  ds  €  D3: 

D4  =  Create  ((^2,^3) 
For  d4  G  D4  : 

-D5  =  Create  ((i3,(i4) 

For  ds  S  -D5 :  List  d\d2dzd4d^  End  for 
End  for 
End  for 
End  for 
End  for 
End  □ 


Exercises 


The  following  exercises  are  intended  to  give  you  some  hands-on  experience  with  simple  decision  trees  before 
we  begin  our  more  systematic  study  of  them  in  the  next  section. 

3.1.1.  What  permutations  of  3  have  the  same  rank  in  lexicographic  order  and  insertion  order? 
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3.1.2.  Draw  the  decision  tree  for  the  permutations  of  4  in  lexicographic  order. 

(a)  What  is  the  rank  of  2,3,1,4?  of  4,1,3,2? 

(b)  What  permutation  in  this  tree  has  rank  5?  rank  15? 

3.1.3.  Draw  the  decision  tree  for  the  permutations  of  4  in  direct  insertion  order. 

(a)  What  is  the  rank  of  2314?  of  4132? 

(b)  What  permutation  in  this  tree  has  rank  5?  rank  15? 

3.1.4.  Draw  the  decision  tree  for  the  permutations  of  4  in  transposition  order. 

(a)  What  is  the  rank  of  2314?  of  4132? 

(b)  What  permutation  in  this  tree  has  rank  5?  rank  15? 

3.1.5.  Draw  a  decision  tree  to  list  all  6-long  sequences  of  A's  and  B's  that  satisfy  the  following  conditions. 

(i)  There  are  no  adjacent  A's. 

(ii)  There  are  never  three  B's  adjacent. 

3.1.6.  Draw  the  decision  tree  for  the  strictly  decreasing  functions  in  6-  in  lex  order. 

(a)  What  is  the  rank  of  5,4,3,1?  of  6,5,3,1? 

(b)  What  function  has  rank  0?  rank  7? 

(c)  What  is  the  largest  rank  and  which  function  has  it? 

(d)  Your  tree  should  contain  the  decision  tree  for  the  strictly  decreasing  functions  in  5-.  Indicate  it 
and  use  it  to  list  those  functions  in  order. 

(e)  Indicate  how  all  the  parts  of  this  exercise  can  be  interpreted  in  terms  of  subsets  of  a  set. 

3.1.7.  Draw  the  decision  tree  for  the  nonincreasing  functions  in  4^. 

(a)  What  is  the  rank  of  321?  of  443? 

(b)  What  function  has  rank  0?  rank  15? 

(c)  What  is  the  largest  rank  and  which  function  has  it? 

(d)  Your  tree  should  contain  the  decision  tree  for  the  nonincreasing  functions  in  3-.  Circle  that  tree 
and  use  its  leaves  to  list  the  nonincreasing  functions  in  3-. 

(e)  Can  you  find  the  decision  tree  for  the  strictly  decreasing  functions  in  4-  in  your  tree? 

3.1.8.  Describe  the  lex  order  decision  tree  for  producing  all  the  names  in  Example  2  in  the  Introduction  to 
Part  I. 

*3.1.9.  In  a  list  of  the  permutations  on  n  with  n  >  1,  the  permutation  just  past  the  middle  of  the  list  has 
rank  n!/2. 

(a)  What  is  the  permutation  of  n  that  has  rank  0  in  insertion  order?  rank  n!  —  1?  rank  n! /2? 

(b)  What  is  the  permutation  of  n  that  has  rank  0  in  transposition  order?  rank  n\  —  1?  rank  n!/2? 

(c)  What  is  the  permutation  of  n  that  has  rank  0  in  lex  order?  rank  n\  —  1?  rank  n!/2  when  n  is 

even?  rank  n\/2  when  n  is  odd? 

Hint.  For  n!/2,  look  at  some  pictures  of  the  trees. 

3.1.10.  How  many  ways  can  two  full  houses  be  formed? 

3.1.11.  How  many  ways  can  two  5-card  hands  be  formed  so  that  the  first  is  a  full  house  and  the  second 
contains  two  pair? 

(a)  Do  this  in  the  obvious  manner  by  first  choosing  the  full  house  and  then  choosing  the  second 
hand. 

(b)  Do  this  in  the  less  obvious  manner  by  first  choosing  the  second  hand  and  then  choosing  the  full 

house. 

(c)  What  lesson  can  be  learned  from  the  previous  parts  of  this  exercise? 
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3.1.12.  How  many  ways  can  three  full  houses  be  formed? 

3.1.13.  How  many  7- letter  "words"  can  be  formed  from  ERRONEOUSNESS  where  no  letter  is  used  more 
times  than  it  appears  in  ERRONEOUSNESS? 


*3.2   Ranking  and  Unranking 


Suppose  we  have  a  decision  tree  whose  leaves  correspond  to  a  certain  class  of  objects;  e.g.,  the 
permutations  of  3.  An  important  property  of  such  a  decision  tree  is  that  it  leads  to  a  bijection 
between  the  n  leaves  of  the  tree  and  the  set  {0, 1, . . . ,  n  —  1}.  In  the  previous  section  we  called  this 
bijection  the  rank  of  the  leaf. 

Definition  3.2  RANK  and  UNRANK  Suppose  we  are  given  a  decision  tree  with  n 
leaves.  If  v  is  a  leaf,  the  function  RANK(w)  is  the  number  of  leaves  to  the  left  of  it  in  the 
decision  tree.  Since  RANK  is  a  bijection  from  leaves  to  {0, 1, . . . ,  n  —  1},  it  has  an  inverse  which 
we  call  UNRANK. 

How  can  these  functions  be  used? 

•  Simpler  storage:  They  can  simplify  the  structure  of  an  array  required  for  storing  data  about  the 
objects  at  the  leaves.  The  RANK  function  is  the  index  into  the  storage  array.  The  UNRANK 
function  tells  which  object  has  data  stored  at  a  particular  array  location.  Thus,  regardless  of 
the  nature  of  the  N  objects  at  the  leaves,  we  need  only  have  an  A''-long  singly  indexed  array  to 

store  data. 

•  Less  storage:  Suppose  we  are  looking  at  the  720  permutations  of  6.  Using  RANK  and  UNRANK, 
we  need  a  720-long  array  to  store  data.  If  we  took  a  naive  approach,  we  might  use  a  six  dimen- 
sional array  (one  dimension  for  each  of  /(I), . . .  ,/(6)),  and  each  dimension  would  be  able  to 
assume  six  values.  Not  only  is  this  more  complicated  than  a  singly  dimensioned  array,  it  requires 
6^  =  46, 656  storage  locations  instead  of  720. 

•  Random  generation:  Suppose  that  you  want  to  study  properties  of  typical  objects  in  a  set  of 
N  objects,  but  that  A'^  is  much  too  large  to  look  at  all  of  them.  If  you  have  a  random  number 
generator  and  the  function  UNRANK,  then  you  can  look  at  a  random  selection  of  objects:  As 

often  as  desired,  generate  a  random  integer,  say  r,  between  0  and  —  1  inclusive  and  study  the 
object  UNRANK  (r).  This  is  sometimes  done  to  collect  statistics  on  the  behavior  of  an  algorithm. 

You'll  now  learn  a  method  for  computing  RANK  and  UNRANK  functions  for  some  types  of 
decision  trees.  The  basic  idea  behind  all  these  methods  is  that,  when  you  make  a  decision,  all  the 
decisions  to  the  left  of  the  one  just  made  contribute  the  leaves  of  their  "subtrees"  to  the  rank  of  the 

leaf  we  are  moving  toward. 

To  talk  about  this  conveniently,  we  need  some  terminology.  Let  e  ~  {v,w)  be  an  edge  of  a  tree 
associated  with  a  decision  D  at  vertex  v;  that  is,  e  connects  v  to  the  vertex  w  that  is  reached  by 
making  decision  D  at  v.  The  residual  tree  of{v,w),  written  R{v,w)  or  R{e),  is  v  together  with  all 
edges  and  vertices  that  can  be  reached  from  v  by  starting  with  one  of  the  D  decisions  to  the  left 
of  {v,  w).  For  example,  the  residual  tree  of  the  edge  labeled  2  in  Figure  3.6(b)  consists  of  the  edges 
labeled  1,  23  and  32  and  the  four  vertices  attached  to  these  edges.  The  edge  labeled  2  has  D  =  1. 
When  D  =  0,  there  are  no  edges  to  the  left  of  e  that  have  w  as  a  parent.  Thus,  the  residual  tree 
of  e  consists  of  just  one  vertex  when  Z)  =  0.  Let  A(e)  be  the  number  of  leaves  in  R{e),  not  counting 
the  root.  Thus  A(e)  =  0  when  e  corresponds  to  D  =  0  and  A(e)  >  0  otherwise.  The  following  result 
forms  the  basis  of  our  calculations  of  RANK  and  UNRANK. 
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123  132  213  231  312  321 


Figure  3.8  Decision  trees  for  permutations  of  {1,2,3}.  The  decisions  taken  to  reach  312  are  shown  in 
double  Unes.  The  lex  order  is  on  the  left  and  direction  insertion  on  the  right.  We've  omitted  commas. 


Theorem  3.1  Rankformula  If  the  sequence  of  edges  on  the  path  from  the  root  to  a  leaf  X 
in  a  decision  tree  is  ei,  62,  ...  e^,  then 


Before  proving  this  let's  look  at  some  simple  situations  to  see  what's  happening.  In  Figure  3.8 
we  have  the  lex  and  insertion  order  decision  trees  for  permutations  of  3  with  the  decisions  needed  to 
reach  312  drawn  double.  For  lex  order  wc  have  A(e)  =  4  for  e  =  (•,  3)  and  A(e)  =  0  for  e  =  (3,  312). 
(We  write  this  simply  as  A(»,  3)  =  4  and  A(3, 312)  =  0.)  For  the  insertion  order  we  have  A(l,  12)  =  0 
and  A(12, 312)  =  2.  This  gives  ranks  of  4  and  2  in  the  two  trees. 

Proof:    Now  let's  prove  the  theorem.  We'll  use  induction  on  m.  Let  ei  =  {v,  w).  The  leaves  to  the 

left  of  X  in  the  tree  rooted  at  v  arc  the  leaves  in  R{v,  w)  together  with  the  leaves  to  the  left  of  X  in 
the  tree  starting  at  w.  In  other  words,  RANK(X)  is  A{v,  w)  plus  the  rank  of  X  in  the  tree  starting 
at  w. 

Suppose  that  m  =  1.  Since  m  =  I,  w  must  be  a  leaf  and  hence  w  =  X .  Thus  the  tree  starting 
at  w  consists  simply  of  X  and  so  contains  no  leaves  to  the  left  of  X.  Hence  X  has  rank  0  in  that 
tree.  This  completes  the  proof  for  m  =  1. 

Suppose  the  theorem  is  true  for  decision  sequences  of  length  m  —  1.  Look  at  the  subtree  that  is 
rooted  at  w.  The  sequence  of  edges  from  the  root  w  to  X  is  €2,  ■  ■  ■ ,  e^.  Since  R{ei)  (for  i  >  1)  is  the 
same  for  the  tree  starting  at  w  as  it  is  for  the  full  tree,  it  follows  by  the  induction  assumption  that 
X  has  rank       2  ^(^<)     this  tree.  This  completes  the  proof.  Q 

Calculating  RANK 


Calculating  RANK(X)  is,  in  principle,  now  straightforward  since  we  need  only  obtain  formulas  for 
the  A(ei)'s.  Unfortunately,  each  decision  tree  is  different  and  it  is  not  always  so  easy  to  obtain  such 
formulas.  We'll  work  some  examples  to  show  you  how  it  is  done  in  some  simple  cases. 

Example  3.7  Lex  order  rank  of  all  functions  If  Di,...,Dk  is  a  sequence  of  decisions  in 
the  lex  order  tree  for  n-  leading  to  a  function  /  e  n-,  then  Di  —  f{i)  —  1  since  there  are  f{i)  —  1 
elements  of  n  that  are  less  than  f{i).  What  is  the  value  of  RANK(/)? 

By  Theorem  3.1,  we  need  to  look  at  R{ei),  where  is  an  edge  on  the  path.  The  structure 
of  R{ei)  is  quite  simple  and  symmetrical.  There  are  Di  edges  leading  out  from  the  root.  Whichever 
edge  we  take,  it  leads  to  a  vertex  that  has  n  edges  leading  out.  This  leads  to  another  vertex  with 
n  edges  leading  out  and  so  on  until  we  reach  a  leaf.  In  other  words,  wc  make  one  Dj-way  decision 
and  {k  —  i)  n-way  decisions.  Each  leaf  of  R{ei)  is  reached  exactly  once  in  this  process  and  thus,  by 
the  Rule  of  Product  A(ei)  =  DjU^"*.  We  have  proved 


m 


RANK(X)  =  ^A(ei). 
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321  421  431  432  521  531  532  541  542  543 
Figure  3.9   The  lex  order  decision  tree  for  the  strictly  decreasing  functions  in  5-.  We've  omitted  commas. 


Theorem  3.2    Lex  orderrankofallfunct  ions   For  f  in  the  lex  order  listing  of  all  functions 


m  n-, 


RANK(/)  =  ^(/(z)-l)r 


k—i 


Notice  that  when  n  =  10  this  is  just  a  way  of  writing  the  number  whose  arable  notation  is 
D1D2 .  ■  .Dk-  Indeed,  the  functions  in  10-  written  in  one  line  form  look  just  like  k  digit  numbers 
except  that  each  digit  has  been  increased  by  one.  Lex  order  is  then  the  same  as  numerical  order. 

This  realization,  generalized  from  base  10  to  base  n,  gives  another  approach  to  the  formula  for 
RANK(/).  It  also  makes  it  easy  to  see  how  to  find  the  function  that  immediately  follows  /:  Simply 
write  out  the  set  of  decisions  that  lead  to  /  as  a  number  in  base  n  and  add  1  to  this  number  to 
obtain  the  decisions  for  the  function  immediately  following  /.  To  get  the  function  that  is  k  further 
on,  add  k.  To  get  the  function  that  is  k  earlier,  subtract  k.  (If  we  have  to  carry  or  borrow  past 
the  leftmost  digit  Di,  then  we  arc  attempting  to  go  beyond  the  end  of  the  tree.)  For  instance,  the 
successor  of  1,3,2,3,3  in  3-  is  obtained  as  follows.  The  decisions  that  lead  to  1,3,2,3,3  are  0,2,1,2,2. 
Since  there  are  three  possible  decisions  at  each  point,  we  can  think  of  the  decisions  as  the  base  3 
number  02122  and  add  1  to  it  to  obtain  02200,  which  we  think  of  as  decisions.  Translating  from 
decisions  to  bmction  values,  we  obtain  1,3,3,1,1  as  the  successor  of  1,3,2,3,3.  Q 

Example  3.8  Strictly  decreasing  functions  Wc  wiU  now  study  the  strictly  decreasing  func- 
tions in  n— ,  which  we  have  observed  correspond  to  k  element  subsets  of  n. 

The  decision  tree  for  lex  order  with  n  =  5  and  =  3  is  in  Figure  3.9.  You  can  use  it  to  help  you 
visualize  some  of  the  statements  we'll  be  making. 

Calculating  the  rank  of  a  leaf  differs  from  the  preceding  examples  because  the  tree  lacks  sym- 
metry; however,  it  has  another  feature  which  enables  us  to  calculate  the  rank  quite  easily.  Let's  look 
for  the  rank  of  the  strictly  decreasing  function  /  e  np-. 

Let  ei  be  the  first  edge  on  the  path  to  the  leaf  for  /.  R{e\)  is  the  tree  for  all  strictly  decreasing 
functions  g  €  n-  with  g{l)  <  /(I).  In  other  words,  R{u,v)  is  the  tree  for  the  strictly  decreasing 
functions  in  /(I)  —  1-. 

We  can  generalize  this  observation.  Suppose  the  path  is  ei,  62, ... ,  e^.  Suppose  that  <?  is  a  leaf 
of  R{ei).  This  can  happen  if  and  only  ii  g  ^np-'is  &  strictly  decreasing  function  with 

g{l)  =  /(I),  g{2)  =  /(2), . . .  ,g{i  -  l)  =  f{i  -  l),  and  g{i)  <  f{i). 

Since  g{j)  is  determined  for  j  <  i,  look  just  at  g{i),  g{i  +  1), . . . ,  g{k).  It  is  an  arbitrary  strictly 
decreasing  function  with  initial  value  loss  than  f{i).  Thus  the  leaves  of  R{ei)  can  be  associated  with 
the  strictly  decreasing  functions  in  f{i]  —  \h+2—L_  Since  there  are  C*)  strictly  decreasing  functions  in 
^,A(eO  =  a(!d).Thus 
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Theorem  3.3  Lex  order  rank  of  strictly  decreasing  functions  For  f  in  the  lex  order 
listing  of  all  strictly  decreasing  functions  in  n—, 

BANK(/)  =  i:(/fr-')^ 

Let's  use  the  theorem  to  calculate  the  rank  of  the  strictly  decreasing  function  /  =  96521.  What 
arc  fc  and  n?  Since  there  are  five  numbers  in  the  Ust,  wc  can  sec  that  k  =  5;  however,  there's  no 
way  to  determine  n.  This  means  that  /  has  not  been  fully  specified  since  we  don't  know  its  range. 
On  the  other  hand,  our  formula  doesn't  require  n  and,  as  we  remarked  when  defining  functions,  a 
specification  of  a  function  which  omits  the  range  may  often  be  considered  complete  anyway.  By  our 
formula 

RANK(96521)  =  Q  +  Q  +        +        +  j^J^  =  56  +  5  +  4  +  0  +  0  =  65. 

What  is  the  decision  sequence  for  a  strictly  decreasing  function?  For  the  strictly  decreasing 

functions  in  n-,  the  decision  associated  with  choosing  f{i)  ~  j  is  i  +  j  —  (k  +  1).  Why  is  this  so? 
We  must  choose  distinct  values  for  /(i  +  1), . . . ,  /  (A:).  Thus  there  must  be  at  least  k  ~  i  values  less 
than  j.  Hence  smallest  possible  value  for  /(z)  is  fc  +  1  —  i  Thus  there  are  j  —  {k  +  1  —  i)  possible 

values  less  than  j. 

Note  that  (3.1)  does  not  depend  on  n  and  so,  for  each  k,  there  is  one  decision  tree  that  works  for 
all  n.  The  root  has  an  infinite  number  of  children — one  for  each  value  of  /(I).  Thereafter  the  number 
of  decisions  is  finite  since  f(i)  <  /(I)  for  i  >  1.  This  "universal"  decision  tree  arises  because,  once 
/(I)  is  given,  the  value  of  n  imposes  no  constraints  on  f{i)  for  i  >  1.  Had  we  used  strictly  increasing 
functions  instead,  this  would  not  have  been  the  case  because  n  >  f{i)  >  (1)  for  all  i  >  1.  Q 

*Exannple  3.9  Direct  insertion  order  rank  of  all  permutations  Now  let's  compute  ranks 
for  the  permutations  of  n  in  direct  insertion  order,  a  concept  defined  in  Example  3.4.  We  can  use 
practically  the  same  approach  that  was  used  in  Example  3.7.  Number  the  positions  where  we  can 
insert  j  as  0,1,2, ...  ,j.  (We  started  with  0,  not  1  for  the  first  position.)  If  D2,  D^,  £>„  is 
the  sequence  of  decisions,  then  Dj  is  the  position  into  which  j  is  to  be  inserted.  Note  that  we've 
started  the  sequence  subscripting  with  2  so  that  the  subscript  j  equals  the  number  whose  insertion 
is  determined  by  Dj. 

Since  we  usually  tend  to  write  permutations  in  one  line  form,  we  should  develop  a  method  for 
determining  Dj  from  the  one  line  form.  Here  is  a  simple  rule: 

Write  the  permutation  in  one  line  form.  Count  the  elements  of  the  permutation  from  right 
to  left  up  to  but  not  including  j  and  ignoring  all  elements  that  exceed  j.  That  count  is  the 
position  in  which  j  was  inserted  and  so  equals  Dj. 

As  an  illustration,  consider  the  permutation  with  the  one  line  form  4,6,1,2,5,7,3.  Starting  from  the 
right,  counting  until  we  reach  2  and  ignoring  numbers  exceeding  2,  we  get  that  D^  =  0.  Similarly, 
£>3  =  0.  Since  we  encounter  3,  2  and  1  before  reaching  4,  D4  =  3.  Only  3  intervenes  when  we  search 
for  5,  so  iDs  =  1.  Here's  the  full  list: 

D2  =  0,  D3  =  0,  Da  =  3,  D5  =  1,       =  4,  L>7  =  1. 

Again  R{ei)  has  a  nice  symmetrical  form  as  we  choose  a  route  to  a  leaf: 

•  Choose  one  of  Di  edges  AND 

•  Choose  one  of  i  +  1  edges  (position  for  i  +  1)  AND 


•  Choose  one  of  n  edges  (position  for  n). 
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By  the  Rule  of  Product,  A(ei)  =  Di{i  +       +  2)  •  •  ■  n  =  Din\/i\  and  so  we  have 

Theorem  3.4  Direct  insertion  order  rank  For  f  in  the  direct  insertion  order  listing  of 
all  permutations  on  n, 


This  formula  together  with  our  previous  calculations  gives  us 

RANK(4,6,1,2,5,7,3)  =  0  x  7!/2!  +  0  x  7!/3!  +  3  x  7!/4! 

+  1  X  7!/5!  +  4  X  7!/6!  +  1  x  7!/7!  =  701. 

What  permutation  immediately  follows  /  =  9,8,7,3,1,6,5,2,4  in  direct  insertion  order?  The 
obvious  way  to  do  this  is  to  calculate  UNRANK(1  +  RANK(9,  8,  7,  3, 1,  6,  5,  2, 4)),  which  is  a  lot  of 
work.  By  extending  an  idea  in  the  previous  example,  we  can  save  a  lot  of  effort.  Here  is  a  general 
rule: 

The  next  leaf  in  a  decision  tree  is  the  one  with  the  lexically  next  (legal)  decision 
sequence. 

Let's  apply  it.  The  decisions  needed  to  produce  /  are  0,2,0,2,3,6,7,8.  If  we  simply  add  1  to  the  last 
decision,  we  obtain  an  illegal  sequence  because  there  are  only  nine  possible  decisions  there.  Just  like 
in  arithmetic,  we  reset  it  to  0  and  carry.  Repeating  the  process,  wc  finally  obtain  0,2,0,2,4,0,0,0. 
This  corresponds  to  the  sequence  1,3,1,3,5,1,1,1  of  insertions.  You  should  be  able  to  carry  out  these 
insertions:  start  with  the  sequence  1,  insert  2  in  position  1,  then  insert  3  in  position  3,  and  so  on, 
finally  inserting  9  in  position  1.  The  resulting  permutation  is  3,6,1,5,2,4,7,8,9. 

For  special  situations  such  as  the  ones  we've  been  considering,  one  can  find  other  short  cuts  that 
make  some  of  the  steps  unnciccssary.  In  fact,  one  need  not  even  calculate  the  decision  sequence.  We 
will  not  discuss  these  short  cuts  since  they  aren't  applicable  in  most  situations.  Nevertheless,  you 
may  enjoy  the  challenge  of  trying  to  find  such  shortcuts  for  permutations  in  lex  and  direct  insertion 
order  and  for  strictly  decreasing  functions  in  lex  order.  Q 


Calculating  UNRANK 


The  basic  principle  for  unranking  is  greed. 

Definition  3.3  Greedy  algorithm  A  greedy  algorithm  is  a  multistep  algorithm  that 
obtains  as  much  as  possible  at  the  present  step  with  no  concern  for  the  future. 


RANK(/) 
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The  method  of  long  division  is  an  example  of  a  greedy  algorithm:  If  d  is  the  divisor,  then  at  each 
step  you  subtract  off  the  largest  possible  number  of  the  form  {k  x  10")d  from  the  dividend  that 
leaves  a  nonnegative  number  and  has  1  <  fc  <  9. 

The  greedy  algorithm  for  computing  UNRANK  is  to  choose  Di.  then  D'2  and  so  on,  each  Di  as 
large  as  possible  at  the  time  it  is  chosen.  What  do  we  mean  by  "as  large  as  possible?"  Suppose  we 
are  calculating  UNRANK(m).  If  Di, . . .  ,Di-i  have  been  chosen,  then  make  Di  as  big  as  possible 
subject  to  the  condition  that  ^i^j)  — 

Why  does  this  work?  Suppose  that  Di, . . . ,  have  been  chosen  by  the  greedy  algorithm  and 
are  part  of  the  correct  path.  (This  is  certainly  true  when  i  =  1  because  the  sequence  is  empty!)  We 
will  prove  that  Di  chosen  by  the  greedy  algorithm  is  also  part  of  the  correct  path. 

Suppose  that  a  path  starts  Di, . . . ,  Di^i,  D',-.  If  -D ■  >  Di,  this  cannot  be  part  of  the  correct  path 
because  the  definition  of  Di  gives  A(ei)  +  •  •  •  +  A(ej_i)  +  A(eQ  >  to. 

Now  suppose  that  D[  <  Di.  Let  x  be  the  leftmost  leaf  reachable  from  the  decision  sequences 
that  start  Di, . . . ,  Dj.  Clearly  RANK(,t)  =  A(ei)  +  •  •  •  +  A(ei)  <  to.  Thus  any  leaf  to  the  left  of  x 
will  have  rank  less  than  m,.  Since  all  leaves  reachable  from  D'^  are  to  the  left  of  x,  D[  is  not  part  of 
the  correct  decision  sequence. 

We  have  proven  that  if  D[^  Di,  then  Di,. . . ,  £)•  is  not  part  of  the  correct  path.  It  follows 
that  Di, . . . ,  Di^i,  Di  must  be  part  of  the  correct  path. 

As  we  shall  see,  it's  a  straightforward  matter  to  apply  the  greedy  algorithm  to  unranking  if  we 
have  the  values  of  A  available  for  various  edges  in  the  decision  tree. 

Example  3.10    Strictly  decreasing  functions    What  strictly  decreasing  function  /  in  9-  has 

rank  100? 

In  view  of  Theorem  3.3  (p.  78),  it  is  more  natural  to  work  with  the  function  values  than  the 
decision  sequence.  Since  A(ej)  =  ({^{~^),  it  will  be  handy  to  have  a  table  of  binomial  coefficients  to 

look  at  while  calculating.  Thus  we've  provided  one  in  Figure  3.10.  Since  fc  =  4,  the  value  of  /(I)  —  1 
is  the  largest  x  such  that  (^)  does  not  exceed  100.  We  find  x  =  d>  and  (^)  =  70.  We  now  need  to  find 
X  so  that  (3)  <  100  -  70  =  30.  This  gives  6  with  30  -  20  =  10  leaves  unaccounted  for.  With  (^)  =  10 
all  leaves  are  accounted  for.  Thus  we  get  (J)  for  the  last  A.  Our  sequence  of  values  for  f{i)  —  1  is 
thus  8,6,5,0  and  so  /  =  9,  7,  6, 1. 

Although  we  specified  that  the  domain  of  /  was  9,  the  value  9  was  never  used  in  our  calculations. 
This  is  like  the  situation  we  encountered  when  computing  the  rank.  Thus,  for  ranking  and  unranking 
decreasing  functions  in  n-,  there  is  no  need  to  specify  n. 

Now  let's  find  the  strictly  decreasing  function  of  rank  65  when  fc  =  5.  The  rank  formula  is 

Since  (^)  =  56  <  65  <  (^)  =  126,  it  follows  that  /(I)  =  9.  This  leaves  65  -  56  9  to  account  for. 
Since  (^j  =  5  <  9  <  (^)  =  15,  we  have  /(2)  =  6  and  9  -  5  =  4  to  account  for.  Since  (3)  =  4,  /(3)  =  5 
and  there  is  no  more  rank  to  account  for.  If  we  choose  /(4)  <  2  and  /(5)  <  1,  the  last  two  binomial 
coefficients  in  (3.2)  will  vanish.  How  do  we  know  what  values  to  choose?  The  key  is  to  remember 
that  /  is  strictly  decreasing  and  takes  on  positive  integer  values.  These  conditions  force  /(5)  =  1 
and  /(4)  =  2.  We  got  rid  of  the  apparent  multiple  choice  for  /  in  this  case,  but  will  that  always  be 
possible? 

*       *       *       Stop  and  think  about  this!        *       *  * 

Yes,  in  any  unranking  situation  there  is  at  most  one  answer  because  each  thing  has  a  unique  rank. 
Furthermore,  if  the  desired  rank  is  less  than  the  total  number  of  things,  there  will  be  some  thing 
with  that  rank.  Hence  there  is  exactly  one  answer.  Q 


3.2     Ranking  and  Unranking 


81 


0 

1 

2 

3 

0 

1 

2 

3 

4 

5 

6 

7 

0 

1 

8 

1 

8 

28 

56 

70 

1 

1 

9 

1 

9 

36 

84 

126 

2 

1 

2 

10 

1 

10 

45 

120 

210 

252 

3 

1 

3 

11 

1 

11 

55 

165 

330 

462 

4 

1 

4 

6 

12 

1 

12 

66 

220 

495 

792 

924 

5 

1 

5 

10 

13 

1 

13 

78 

286 

715 

1287 

1716 

6 

1 

6 

15 

20 

14 

1 

14 

91 

304 

1001 

2002 

3003 

3432 

7 

1 

7 

21 

35 

15 

1 

15 

105 

455 

1365 

3003 

5005 

6435 

Figure  3.10    Some  binomial  coefficients.  The  entry  in  row  n  and  column  k  is  For  n  >  k/2  use 

(fe)  —  in-k)- 


*Exannple  3.11    Direct  insertion  order   What  permutation  /  of  7  has  rank  3,000  in  insertion 

order? 

Let  the  decision  sequence  be  D2,  ■  ■  ■ ,  D^.  The  number  of  leaves  in  the  residual  tree  for  Dj  is 
Di  X  7\/i\  by  our  derivation  of  Theorem  3.4  (p.  79).  Since  7!/2!  =  2,520,  the  greedy  value  for 
is  1.  The  number  of  leaves  unaccounted  for  is  3,000  -  2,520  =  480.  Since  7!/3!  =  840,  the  greedy 
value  for  D3  is  0  and  480  leaves  are  still  unaccounted  for.  Since  7!/4!  =  210,  wc  get  D4  =  2  and 
60  remaining  leaves.  Using  7!/5!  =  42,  7!/6!  =  7  and  7!/7!  =  1,  we  get  D5  =  1,  Dq  =  2  and  Dr  =  4. 
Thus  the  sequence  of  decisions  is  1,0,2,1,2,4,  which  is  the  same  as  insertion  positions.  You  should  be 
able  to  see  that  /  =  2, 4, 7, 1, 6, 5, 3.  □ 


*Gray  Codes 


Suppose  we  want  to  write  a  program  that  will  have  a  loop  that  runs  through  all  permutations  of  n. 
One  way  to  do  this  is  to  run  through  numbers  0,  . . .,  n!  —  1  and  apply  UNRANK  to  each  of  them. 
This  may  not  be  the  best  way  to  construct  such  a  loop.  One  reason  is  that  computing  UNRANK 
may  be  time  consuming.  Another,  sometimes  more  important  reason  is  that  it  may  be  much  easier 
to  deal  with  a  permutation  that  does  not  differ  very  much  from  the  previous  one.  For  example,  if  we 
had  n  large  blocks  of  data  of  various  lengths  that  had  to  be  in  the  order  given  by  the  permutation, 
it  would  be  nice  if  we  could  produce  the  next  permutation  simply  by  swapping  two  adjacent  blocks 
of  data. 

Methods  that  list  the  elements  of  a  set  so  that  adjacent  elements  in  the  list  are,  in  some  natural 
sense,  close  together  are  called  Gray  codes. 

Suppose  we  are  given  a  set  of  objects  and  a  notion  of  closeness.  How  does  finding  a  Gray 
code  compare  with  finding  a  ranking  and  unranking  algorithm?  The  manner  in  which  the  objects 
are  defined  often  suggests  a  natural  way  of  listing  the  objects,  which  leads  to  an  efficient  ranking 
algorithm  (and  hence  a  greedy  unranking  algorithm).  In  contrast,  the  notion  of  closeness  seldom 
suggests  a  Gray  code.  Thus  finding  a  Gray  code  is  usually  harder  than  finding  a  ranking  algorithm. 
If  we  are  able  to  find  a  Gray  code,  an  even  harder  problem  appears:  Find,  if  possible,  an  efficient 
ranking  algorithm  for  listing  the  objects  in  the  order  given  by  the  Gray  code. 

All  we'll  do  is  discuss  one  of  the  simplest  Gray  codes. 
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Example  3.12  A  Gray  code  for  subsets  of  a  set  Wc  want  to  look  at  all  subsets  of  n.  It  will 
be  more  convenient  to  work  with  the  representation  of  subsets  by  n-strings  of  zeroes  and  ones:  The 
string  si . . .  s„  corresponds  to  the  subset  S  where  i  G  5  if  and  only  if  Sj  =  1.  Thus  the  all  zeroes 
string  corresponds  to  the  empty  set  and  the  all  ones  string  to  n. 

Providing  ranking  and  unranking  algorithms  is  simple;  just  think  of  the  strings  as  n-digit  binary 
numbers.  While  adjacent  strings  in  this  ranking  are  close  together  numerically,  their  patterns  of 
zeroes  and  ones  may  differ  greatly.  As  we  shall  see,  this  tells  us  that  the  ranking  doesn't  give  a  Gray 
code. 

Before  we  can  begin  to  look  for  a  Gray  code,  we  must  say  what  it  means  for  two  subsets  (or, 
equivalently,  two  strings)  to  be  close.  Two  strings  will  be  considered  close  if  they  differ  in  exactly 
one  position.  In  set  terms,  this  means  one  of  the  sets  can  be  obtained  from  the  other  by  removing 
or  adding  a  single  element.  With  this  notion  of  closeness,  a  Gray  code  for  all  subsets  when  n  =  1  is 
0,  1.  A  Gray  code  for  all  subsets  when  n  =  2  is  00,  01,  11,  10. 

How  can  wc  produce  a  Gray  code  for  all  subsets  for  arbitrary  n?  There  is  a  simple  recursive 
procedure.  The  following  construction  of  the  Gray  code  for  n  =  3  illustrates  it. 


You  should  read  down  the  first  column  and  then  down  the  second.  Notice  that  the  sequences  in  the 
first  column  begin  with  0  and  those  in  the  second  with  1.  The  rest  of  the  first  column  is  simply  the 
Gray  code  for  n  =  2  while  the  second  column  is  the  Gray  code  for  n  =  2,  read  from  the  last  sequence 
to  the  first. 

We  now  prove  that  this  "two  column"  procedure  for  building  a  Gray  code  for  subsets  of  an  n-set 
from  the  Gray  code  for  subsets  of  an  (n  —  l)-set  always  works.  Our  proof  will  be  by  induction.  For 
n  =  1,  we  have  already  exhibited  a  Gray  code.  Suppose  that  n  >  1  and  that  wc  have  a  Gray  code 
for  n  —  1.  (This  is  the  induction  assumption.)  We  form  the  first  column  by  listing  the  Gray  code  for 
n  —  1  and  attaching  a  0  at  the  front  of  each  (n  —  l)-string.  We  form  the  second  column  by  listing 
the  Gray  code,  starting  with  the  last  and  ending  with  the  first,  and  attaching  a  1  at  the  front  of 
each  (n  —  l)-string.  Within  a  column,  there  is  never  any  change  in  the  first  position  and  there  is 
only  a  single  change  from  line  to  line  in  the  remaining  positions  because  they  are  a  Gray  code  by 
the  induction  assumption.  Between  the  bottom  of  the  first  cohimn  and  the  top  of  the  second,  the 
only  change  is  in  the  first  position  since  the  remaining  n  —  1  positions  are  the  last  element  of  our 
Gray  code  for  n  —  1.  This  completes  the  proof. 

As  an  extra  benefit,  we  note  that  the  last  element  of  our  Gray  code  differs  in  only  one  position 
from  the  first  element  (Why?),  so  we  can  cycle  around  from  the  last  element  to  the  first  by  a  single 
change. 

It  is  a  simple  matter  to  draw  a  decision  tree  for  this  Gray  code.  In  fact.  Figure  3.1  is  practically 
the  n  =  3  case — all  we  need  to  do  is  change  some  labels  and  keep  track  of  whether  we  have  reversed 
the  code  for  the  second  column.  (Reversing  the  code  corresponds  to  interchanging  the  0  and  1  edges.) 
The  decision  tree  is  shown  in  Figure  3.11.  There  is  an  easy  way  to  decide  whether  the  two  edges 
leading  down  from  a  vertex  v  should  be  labeled  0-1  or  1-0:  If  the  edge  e  leading  into  w  is  a  0  decision 
(i.e.,  0  edge),  use  the  same  pattern  that  was  used  for  e  and  the  other  decision  e'  it  was  paired  with; 
otherwise,  reverse  the  order.  Another  way  you  can  think  of  this  is  that  as  you  trace  a  path  from  the 
root  to  a  leaf,  going  left  and  right,  a  0  causes  you  to  continue  in  the  same  direction  and  a  1  causes 
you  to  switch  directions.  Q 
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000  001  oil  010  110  111  101  100 


Figure  3.11    The  decision  tree  for  the  Gray  code  for  3. 

Exercises 


Unless  otherwise  noted,  the  domain  is  k  for  some  k  and  the  range  is  n  for  some  n.  Usually,  there  is  no  need 
to  specify  n  and  specifying  the  function  determines  k. 

3.2.1.  This  problem  concerns  strictly  decreasing  functions  listed  in  lex  order. 

(a)  Compute  the  rank  of  the  functions  11,6,4  and  9,6,3,1.  (Note  that  the  domain  of  the  first  is  3  and 
the  domain  of  the  second  is  4.) 

(b)  What  strictly  decreasing  function  with  domain  4  has  rank  35?  rank  400? 

(c)  What  strictly  decreasing  function  immediately  follows  9,6,3,2,1?  9,6,5,4? 

(d)  What  strictly  decreasing  function  immediately  precedes  9,6,3,2,1?  9,6,5,4? 

3.2.2.  This  problem  concerns  permutations  listed  in  direct  insertion  order. 

(a)  Compute  the  ranks  of  6,3,5,1,2,4  and  4,5,6,1,2,3. 

(b)  Determine  the  permutations  of  6  with  ranks  151  and  300. 

(c)  What  permutation  immediately  follows  9,8,7,1,2,3,4,5,6?  6,5,4,1,2,3,7,8,9? 

(d)  What  permutation  immediately  precedes  9,8,7,1,2,3,4,5,6?  6,5,4,1,2,3,7,8,9? 

3.2.3.  Consider  the  three  strictly  decreasing  functions  from  k  to  n: 

(1)  k,  k-1,...,  2,  1 

(2)  fc  +  1,  fc,...,  3,  2 

(3)  fc  +  2,  fc  +  1,...,  4,  3 
Obtain  simple  formulas  for  their  ranks. 

3.2.4.  This  problem  concerns  nonincreasing  functions  listed  in  lex  order. 

(a)  Prove  that 

RANK,/,.t(n;r-r')- 

1=1  ^  ^ 

Hint.  Example  2.11  (p.  52)  may  be  useful. 

(b)  Compute  the  ranks  of  5,5,4,2,1,1  and  6,3,3. 

(c)  What  nonincreasing  function  on  4  has  rank  35?  400? 

3.2.5.  This  problem  concerns  the  permutations  listed  in  lex  order. 

(a)  Obtain  a  formula  for  RANK(/)  in  terms  of  the  decisions  Di, . .  .  ,  Dn  (or  Di, . .  . ,  -D„_i  if  /(ra—  1) 
and  /(n)  are  considered  as  being  specified  at  the  same  time). 

(b)  Describe  a  method  for  going  from  a  permutation  in  one  line  form  to  the  decision  sequence. 

(c)  Compute  the  ranks  of  5,6,1,3,4,2  and  6,2,4,1,3,5. 

(d)  Compute  the  sequence  of  decisions  for  permutations  of  6  which  have  ranks  151  and  300.  What 
are  the  one  line  forms  of  the  permutations? 
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3.2.6.  Write  computer  code  for  procedures  to  calculate  RANK  and  UNRANK.  You  might  store  the  function 
being  ranked  as  an  array  of  integers  where  the  first  component  is  the  size  of  the  domain.  The  classes 
of  functions  for  which  you  should  write  procedures  are: 

(a)  the  strictly  decreasing  functions  in  n-  in  lex  order; 

(b)  the  nonincreasing  functions  in  n—  in  lex  order. 

(c)  the  permutations  of  n  in  insertion  order; 

(d)  the  permutations  of  n  in  lex  order; 

Hint.  For  the  first  two,  it  might  be  useful  to  have  a  procedure  for  calculating  the  binomial  coeflBcients 
C(a,6). 

3.2.7.  Returning  to  Example  3.12,  draw  the  decision  tree  for  n  =  4  and  list  the  Gray  code  for  n  =  5. 

3.2.8.  Returning  to  Example  3.12,  compute  the  ranks  of  the  following  sequences  in  the  Gray  code  for  subsets 

of  n: 

0110;     1001;     10101;     1000010;  01010101. 
(The  value  of  n  is  just  the  length  of  the  sequence.) 

3.2.9.  Returning  to  Example  3.12  for  each  of  the  following  values  of  n  and  k,  find  the  n-string  of  rank  k  in 
the  Gray  code  for  subsets  of  n: 

n  =  20,     =  0;    n  =  20,  fc  =  2^^;    n  =  4,     =  7;    n  =  8,  fc  =  200. 


*3.2.10.  Wo  will  write  the  permutations  of  n  in  one  line  form.  Two  permutations  will  bo  considered  adjacent 
if  one  can  be  obtained  from  the  other  by  interchanging  the  elements  in  two  adjacent  positions.  We 
want  a  Gray  code  for  all  permutations.  Here  is  such  a  code  for  n  =  4.  You  should  read  down  the  first 
column,  then  the  second  and  finally  the  third. 


1,2,3,4 

3,1,2,4 

2,3,1,4 

1,2,4,3 

3,1,4,2 

2,3,4,1 

1,4,2,3 

3,4,1,2 

2,4,3,1 

4,1,2,3 

4,3,1,2 

4,2,3,1 

4,1,3,2 

4,3,2,1 

4,2,1,3 

1,4,3,2 

3,4,2,1 

2,4,1,3 

1,3,4,2 

3,2,4,1 

2,1,4,3 

1,3,2,4 

3,2,1,4 

2,1,3,4 

List  a  Gray  code  for  n  =  5.  As  a  challenge,  describe  a  method  for  listing  a  Gray  code  for  general 
n. 


^3.3  Backtracking 


In  many  computer  algorithms  it  is  necessary  to  systematically  inspect  all  the  vertices  of  a  decision 
tree.  A  procedure  that  systematically  inspects  all  the  vertices  is  called  a  traversal  of  the  tree. 
How  can  we  create  such  a  procedure?  One  way  to  imagine  doing  this  is  to  walk  around  the  tree.  An 
example  is  shown  in  Figure  9.2  (p.  249),  where  we  study  the  subject  in  more  depth.  "Walking  around 
the  tree"  is  not  a  vciry  good  program  description.  We  can  describe  our  traversal  more  precisely  by 
giving  an  algorithm.  Here  is  one  which  traverses  a  tree  whose  leaves  are  associated  with  functions 
and  lists  the  functions  in  the  order  of  their  rank. 
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Theorem  3.5   Systematic  traversal  algorithm   The  following  procedure  systematically 

visits  the  leaves  in  a  tree  from  left  to  right  by  "walking"  around  the  tree. 

1.  Start:  Mark  all  edges  as  unused  and  position  yourself  at  the  root. 

2.  Leaf:  If  you  are  at  a  leaf,  list  the  function. 

3.  Decide  case:  If  there  are  no  unused  edges  leading  out  from  the  vertex,  go  to  Step  4;  otherwise, 
go  to  Step  5. 

4.  Backtrack:   If  you  arc  at  the  root,  STOP;  otherwise,  return  to  the  vertex  just  above  this 
one  and  go  to  Step  3. 

5.  Decision:   Select  the  leftmost  unused  edge  out  of  this  vertex,  mark  it  used,  follow  it  to  a 
new  vertex  and  go  to  Step  2. 

If  you  cannot  easily  visualize  the  entire  route  followed  by  this  algorithm,  take  the  time  now  to  apply 
this  algorithm  to  the  decision  tree  for  2-  shown  in  Figure  3.1  and  verify  that  it  produces  all  eight 
functions  in  lex  order. 

Because  Step  1  refers  to  all  leaves,  the  algorithm  may  be  impractical  in  its  present  form.  This 
can  bo  overcome  by  keeping  a  "decision  sequence"  which  is  updated  as  we  move  through  the  tree. 
Here's  the  modified  algorithm. 

Theorem  3.6  Systematic  traversal  algorithm  (programming  version)  The  following 
procedure  systematically  visits  the  leaves  in  a  tree  from  left  to  right  by  "walking"  around  the 
tree. 

1.  Start:  Initialize  the  decision  sequence  to  —1  and  position  yourself  at  the  root. 

2.  Leaf:  If  you  are  at  a  leaf,  list  the  function. 

3.  Decide  case:  Increase  the  last  entry  in  the  decision  sequence  by  1.  If  the  new  value  equals 
the  number  of  decisions  that  can  be  made  at  the  present  vertex,  go  to  Step  4;  otherwise  go 

to  Step  5. 

4.  Backtrack:  Remove  the  last  entry  from  the  decision  sequence.  If  the  decision  sequence  is 
empty,  STOP;  otherwise  go  to  Step  3. 

5.  Decision:   Make  the  decision  indicated  by  the  last  entry  in  the  decision  sequence,  append 
—1  to  the  decision  sequence  and  go  to  Step  2. 

In  both  versions  of  the  algorithm,  Step  4  is  labeled  "Backtrack."  What  does  this  mean?  If  you  move 
your  pencil  around  a  tree,  this  step  would  correspond  to  going  toward  the  root  on  an  edge  that  has 
already  been  traversed  in  the  opposite  direction.  In  other  words,  backtracking  refers  to  the  process 
of  moving  along  an  edge  hack  toward  the  root  of  the  tree.  Thinking  in  terms  of  the  decision  sequence, 
backtracking  corresponds  to  undoing  (i.e.,  backtracking  on)  what  is  currently  the  last  decision  made. 

If  we  understand  the  structure  of  the  decision  tree  well  enough,  we  can  avoid  parts  of  the  algo- 
rithm In  the  next  example,  we  eliminate  the  descent  since  only  one  element  in  the  Gray  code  changes 
each  time.  In  the  example  after  that,  we  eliminate  the  decision  sequence  for  strictly  decreasing  func- 
tions. In  both  cases,  we  use  empty  to  keep  track  of  the  state  of  the  decision  sequence.  Also,  Step  2 
is  avoided  until  we  know  we  are  at  a  leaf. 
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Example  3.13  Listing  Gray  coded  subsets  In  Example  3.12  wc  looked  at  a  Gray  code  for 
listing  all  elements  of  an  n  element  set.  Since  there  are  only  two  decisions  at  each  vertex,  the  entries 
in  the  decision  sequence  will  be  0  or  1  (but  they  usually  do  not  equal  the  entries  in  the  Gray  code). 
Here  is  the  code  with  s,  being  the  decision  sequence  and     the  Gray  code. 


Procedure  GraySubsets(n) 
For  i  =  l  to  n: 
Si  =  0 
5i  =  0 
End  for 
empty  =  FALSE 
While  (einpty=FALSE)  : 
Output  gi,...,gn. 
For  i  from  n  to  1  by  —1: 
If  (Si  =  0)  then 
Si  =  1 
9i  =  ^-9i 

For  j  from  i  +  1  to  n: 

Sj  =  n  +  1  —  j 
End  for 
Goto  ENDCASE 
End  if 

empty  =  TRUE 
Label  ENDCASE 
End  for 
End  while 

End 


/*  Set  up  first  leaf  */ 


/*  Step  2  */ 

/*  Steps  3  &  4  (moving  up)  */ 


/*  Step  5  (moving  right)  */ 


/*  Steps  3  &  5  (moving  down)  */ 


The  statement  gi  =  1—gi  changes  0  to  1  and  vice  versa.  Since  the  Gray  code  changes  only  one  entry, 
we  are  able  to  move  down  without  changing  any  of  the  gj  values. 

You  may  find  this  easier  to  understand  than  the  presentation  in  Example  3.12.  In  that  case,  why 
don't  we  just  throw  that  example  out  of  the  text?  Although  the  code  may  be  easy  to  follow,  there 
is  no  reason  to  believe  that  it  lists  all  subsets.  Example  3.12  contains  the  proof  that  the  algorithm 
does  list  all  subsets,  and  the  proof  given  there  requires  the  recursive  construction  of  the  gray  code 
based  on  reversing  the  order  of  the  output  for  subsets  of  an  n  —  1  element  set.  Once  we  know  that 
it  works,  we  do  the  backtracking  using  the  fact  that  only  one  gi  is  changed  at  each  output.  Thus, 
we  can  forget  about  the  construction  of  the  Gray  code  as  soon  as  we  know  that  it  works  and  a  little 
about  the  decision  tree.  Q 
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Example  3.14  Listing  strictly  decreasing  functions  In  Example  3.8  we  studied  ranking  and 
unranking  for  strictly  decreasing  functions  from  k  to  n,  which  correspond  to  the  k  element  subsets 
of  n.  Now  we  want  to  list  the  subsets  by  traversing  the  decision  tree. 

Suppose  we  are  at  the  subset  given  by  di  >  •  •  •  >  dk-  Recall  that,  since  the  di  arc  strictly 
decreasing  and  dfc  >  1,  we  must  have  di  >  k  +  1  —  i.  We  must  back  up  from  the  leaf  until  we  find 
a  vertex  that  has  unvisited  children.  If  this  is  the  {i  —  l)st  vertex  in  the  list,  its  next  child  after  di 
will  be  —  1  and  the  condition  just  mentioned  requires  that  di  —  l>k  +  l  —  i.  Here  is  the  code  for 
the  traversal 

Procedure  Subsets (n, fc) 

For  i  from  Itofc:    di=n  +  l  —  i    End  for 
empty  =  FALSE 
While  (empty=FALSE) : 

Output  di, . . . ,  rffe .  /* 
For  i  from  fc  to  1  by  —1:  /* 
If  idi>  k  +  l-i)  then 

di  =  di-1  /* 
For  j  from  i  +  1  to  k: 

dj  =  n+1  —  j  /* 
End  for 
Goto  ENDCASE 
End  if 

empty  =  TRUE 
Label  ENDCASE 
End  for 
End  while 
End  □ 

So  far  in  our  use  of  decision  trees,  it  has  always  been  clear  what  decisions  are  reasonable;  i.e., 
lead,  after  further  decisions,  to  a  solution  of  our  problem.  This  is  because  we've  looked  only  at  simple 
problems  such  as  generating  all  permutations  of  n  or  generating  all  strictly  decreasing  functions  in 
Zir-.  Consider  the  following  problem. 

How  many  permutations  /  of  n  are  such  that 
- + 1)1  <  3    for  l<i<n? 

It's  not  at  all  obvious  what  decisions  are  reasonable  in  this  case.  For  instance,  when  n  =  9,  the 
partially  specified  one  line  function  124586  cannot  be  completed  to  a  permutation. 

There  is  a  simple  cure  for  this  problem:  Wc  will  allow  ourselves  to  make  decisions  which  lead 
to  "dead  ends,"  situations  where  we  cannot  continue  on  to  a  solution.  With  this  expanded  notion 
of  a  decision  tree,  there  are  often  many  possible  decision  trees  that  appear  reasonable  for  doing 
something.  We'll  look  at  this  a  bit  in  a  minute.  For  now,  let's  look  at  our  problem  (3.3).  Suppose 
that  we're  generating  things  in  lex  order  and  we've  reached  the  vertex  12458.  What  do  we  do  now? 
We'll  simply  continue  to  generate  more  of  the  permutation,  making  sure  that  (3.3)  is  satisfied  for 
that  portion  of  the  permutation  wc  have  generated  so  far.  The  resulting  portion  of  the  tree  that 
starts  at  1,2,4,5,8  is  shown  in  Figure  3.12.  Each  vertex  is  labeled  with  the  part  of  the  permutation 
after  12458.  The  circled  leaves  are  solutions. 

Our  tree  traversal  algorithm  given  at  the  start  of  this  section  requires  a  slight  modification  to 
cover  our  extended  decision  tree  concept  where  a  leaf  need  not  be  a  solution:  Change  Step  2  to 

2'.  Solution?:  If  you  are  at  a  solution,  take  appropriate  action. 

How  can  there  be  more  than  one  decision  tree  for  generating  solutions  in  a  specified  order? 
Suppose  someone  who  was  not  very  clever  wanted  to  generate  all  permutations  of  n  in  lex  order. 


Step  2  */ 

Steps  3  &  4  (moving  up)  */ 
Step  5  (moving  right)  */ 
Steps  3  &  5  (moving  down)  */ 
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9  6  7 
9    6    3         7  6 
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Figure  3.12  The  portion  of  the  decision  tree  for  (3.3)  that  begins  1,2,4,5,8.  The  decision  that  led  to  a 
vertex  is  placed  at  the  vertex  rather  than  on  the  edge.  The  circled  leaves  are  solutions. 


He  might  program  a  computer  to  generate  all  functions  in  n—  in  lex  order  and  to  then  discard  those 
functions  which  are  not  permutations.  This  leads  to  a  much  bigger  tree  because  n"  is  much  bigger 
than  n\,  even  when  n  is  as  small  as  3.  A  somewhat  cleverer  friend  might  suggest  that  he  have  the 
program  check  to  see  that  f{k)  ^  f{k  —  1)  for  each  k  >  1.  This  won't  slow  down  the  program  very 
much  and  will  lead  to  only  n{n  —  functions.  Thus  the  program  should  run  faster.  Someone 

else  might  suggest  that  the  programmer  check  at  each  step  to  see  that  the  function  produced  so  far 
is  an  injection.  If  this  is  done,  nothing  but  permutations  will  be  produced,  but  the  program  may  be 
much  slower.  Someone  more  knowledgeable  might  suggest  a  way  to  convert  a  decision  sequence  to  a 
permutation  using  the  ideas  of  the  previous  section.  This  would  give  the  smallest  possible  tree  and 
would  not  take  very  long  to  run.  You  might  try  to  figure  out  how  to  do  that. 

The  lesson  to  be  learned  from  the  previous  paragraph  is  that  there  is  often  a  trade  off  between 
the  size  of  the  decision  tree  and  the  time  that  must  be  spent  at  each  vertex  determining  what 
decisions  to  allow.  (For  example,  the  decision  to  allow  only  those  values  for  f{k)  which  satisfy 
f{k)  ^  f{k  —  1).)  Because  of  this,  different  people  may  develop  different  decision  trees  for  the 
same  problem.  The  differences  between  computer  run  times  for  different  decision  trees  can  be  truly 
enormous.  By  carefully  limiting  the  decisions,  people  have  changed  problems  that  were  too  long  to 
run  on  a  supercomputer  into  problems  that  could  be  easily  run  on  a  personal  computer. 

We'll  conclude  this  section  with  two  examples  of  backtracking  of  the  type  just  discussed. 

Example  3.15    Latin  squares    An  n  x  n  Latin  square  is  an  n  x  n  array  in  which  each  element 

of  n  appears  exactly  once  in  each  row  and  column.  Let  L{n)  be  the  number  of  n  x  n  Latin  Squares. 
Finding  a  simple  formula,  or  even  a  good  estimate,  for  L{n)  is  an  unsolved  problem.  How  can  we 
use  backtracking  to  compute  L{n)  for  small  values  of  n? 

The  number  of  Latin  Squares  increases  rapidly  with  n,  so  anything  we  can  do  to  reduce  the  size 
of  the  decision  tree  will  be  a  help.  Here's  a  way  to  cut  our  work  by  a  factor  of  n\.  Let's  agree  to 
rearrange  the  columns  of  a  Latin  Square  so  that  the  first  row  always  reads  1,  2, 3, . . . ,  n.  We'll  say 
such  a  square  is  "first  row  ordered."  Given  a  first  row  ordered  square,  we  can  permute  the  columns 
in  n!  ways,  each  of  which  leads  to  a  different  Latin  Square.  Hence  L{n)  is  n!  times  the  number  of 
first  row  ordered  Latin  Squares. 

By  next  rearranging  the  rows,  we  can  get  the  entries  in  the  first  column  in  order,  too.  If  we're 
sloppy,  we  might  think  this  gives  us  another  factor  of  n\.  This  is  not  true  because  1  is  already  at 
the  top  of  the  first  column  due  to  our  ordering  of  the  first  row.  Hence  only  the  second  through  nth 
positions  are  arbitrary  and  so  we  have  a  factor  of  (n  —  1)!. 

Let's  organize  what  we've  got  now.  We'll  call  a  Latin  Square  in  standard  form  if  the  entries  in 
the  first  row  are  in  order  and  the  entries  in  the  first  column  are  in  order.  Each  n  x  n  Latin  Square 
in  standard  form  is  associated  with  nl  {n  —  1)!  Latin  Squares  which  can  be  obtained  from  it  by 
permuting  all  rows  but  the  first  and  then  permuting  all  columns.  The  standard  form  Latin  Squares 
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12       123       1234  1234  1234  1234 

2  1       23  1       2  1  43  2  1  43  2  34  1  24  1  3 

312       3412  3421  3412  3142 

4321  4312  4123  4321 

Figure  3.13   The  standard  Latin  squares  for  n  <  4. 


1       3      choice  for  (2,2)  entry 
I 

1     choice  for  (2,3)  entry 

I 

1  choice  for  (3,2)  entry 

I 

2  choice  for  (3,3)  entry 

Figure  3.14   A  decision  tree  for  3  x  3  Latin  squares. 


for  n  <  4  are  shown  in  Figure  3.13.  It  follows  that  L(l)  =  1  x  l!  0!  =  1,  L(2)  =  1  x  2!  1!  =  2, 
i(3)  =  1  X  3!  2!  =  12  and  L(4)  =  4  x  4!  3!  =  576. 

We'll  set  up  the  decision  tree  for  generating  Latin  Squares  in  standard  form.  Fill  in  the  entries 
in  the  array  in  the  order  (2,2),  (2,3),  . . .,  (2,n),  (3,1),. . .,  (3,n),  . . .,  (n,n);  that  is,  in  the  order  we 
read — left  to  right  and  top  to  bottom.  The  decision  tree  for  n  =  3  is  shown  in  Figure  3.14,  where 
each  vertex  is  labeled  with  the  decision  that  leads  to  it.  The  leaf  on  the  bottom  right  corresponds 
to  the  standard  form  Latin  Square  for  n  =  3.  The  leaf  on  the  left  is  the  end  of  a  path  that  has  died 
because  there  is  no  way  to  choose  entry  (2,3).  Why  is  this?  At  this  point,  the  second  row  contains 
1  and  2,  and  the  third  column  contains  3.  The  (2,3)  entry  must  be  different  from  all  of  these,  which 
is  impossible.  D 


Example  3.16  Arranging  dominoes  A  backtracking  problem  is  stated  in  Example  4  at  the 
beginning  of  this  part:  We  asked  for  the  number  of  ways  to  systematically  arrange  32  dominoes  on 

a  chessboard  so  that  each  one  covers  exactly  two  squares  and  every  square  is  covered. 

If  the  squares  of  the  board  are  numbered  systematically  from  1  to  64,  we  can  describe  any 
placement  of  dominoes  by  a  sequence  of  32  /I's  and  v's:  Place  dominoes  sequentially  as  follows.  If  the 
first  unused  element  in  the  sequence  is  h,  place  a  horizontal  domino  on  the  first  unoccupied  square 
and  the  square  to  its  right.  If  the  first  unused  element  in  the  sequence  is  v,  place  a  vertical  domino 
on  the  first  unoccupied  square  and  the  square  just  below  it.  Not  all  sequences  correspond  to  legal 
arrangements  because  some  lead  to  overlaps  or  to  dominoes  off  the  board.  For  a  2  x  2  board,  the 
only  legal  sequences  are  hh  and  vv  For  a  2  x  3  board,  the  legal  sequences  are  hvh,  vhh  and  vvv.  For 
a  3  X  4  board,  there  are  eleven  legal  sequences  as  shown  in  Figure  3.15. 

To  find  these  sequences  in  lex  order  we  used  a  decision  tree  for  generating  sequences  of  /I's  and 
ti's  in  lex  order.  Each  decision  is  required  to  lead  to  a  domino  that  lies  entirely  on  the  board  and 
does  not  overlap  another  domino.  The  tree  is  shown  in  Figure  3.16.  Each  vertex  is  labeled  with  the 
choice  that  led  to  the  vertex.  The  leaf  associated  with  the  path  vhvvv  docs  not  correspond  to  a 
covering.  It  has  been  abandoned  because  there  is  no  way  to  place  a  domino  on  the  lower  left  square 
of  the  board,  which  is  the  first  free  square.  Draw  a  picture  of  the  board  to  see  what  is  happening. 

Our  systematic  traversal  algorithm  can  be  used  to  traverse  the  decision  tree  without  actually 
drawing  it;  however,  there  is  a  slight  difficulty:  It  is  not  immediately  apparent  what  the  possible 
decisions  at  a  given  vertex  are.  This  is  typical  in  backtracking  problems.  (We  slid  over  it  in  the 
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hhhhhh        hhhvvh        hhvhvh        hhvvhh        hhvvvv  hvvhhh 

ffl  H  ffi  ffl  sg 

hvvvvh        vhvhhh        vvhhhh        vvhvvh  vvvvhh 
Figure  3.15   The  domino  coverings  of  a  3  x  4  board  in  lex  order. 
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Figure  3.16  The  lex  order  decision  tree  for  domino  coverings  of  a  3  x  4  board.  The  decision  that  led  to  a 
vertex  is  placed  at  the  vertex  rather  than  on  the  edge. 


previous  example.)  A  common  method  for  handling  the  problem  is  to  keep  track  of  the  nature  of 
the  decision  {h  or  v)  rather  than  its  numerical  value.  For  example,  suppose  our  decision  sequence  is 
u, /i,  —  1  and  we  are  at  Step  3  (Decide  case).  We  attempt  to  increase  —1  to  h  and  find  that  this  is 
impossible  because  it  results  in  a  domino  extending  off  the  board.  Thus  we  try  to  increase  it  to 
which  is  acceptable.  After  a  few  more  moves  through  the  tree,  we  arrive  Step  3  with  the  sequence 
V,  h,  V,  V,  V,  —1.  When  we  try  to  increase  —  1  to  /i,  the  domino  overlaps  another.  When  we  try  v,  the 
domino  extends  off  the  board.  Hence  the  vertex  is  a  leaf  but  not  a  solution,  so  we  go  to  Step  4.  Q 


Exercises 


3.3.1.  Draw  a  decision  tree  for  the  4x4  standard  form  Latin  Squares. 

3.3.2.  The  "n  queens  problem"  is  to  place  n  queens  on  an  n  x  n  chessboard  so  that  no  queen  attacks 
another.  For  those  not  familiar  with  chess,  one  queen  attacks  another  if  it  is  in  tlic  same  row,  column 
or  (iiagonal.  Convince  yourself  that  the  2  queens  and  3  queens  problems  have  no  solutions  and  that 
the  4  queens  problem  has  two  solutions.  A  necessary,  but  not  sufficient,  condition  for  a  solution  is 
that  each  row  contain  a  queen.  Thus  we  can  imagine  a  sequence  of  n  decisions  to  obtain  a  solution: 
The  ith  decision  is  where  to  place  a  queen  in  the  ith  row. 

(a)  Find  the  2  solutions  to  the  4  queens  problem  by  drawing  a  decision  tree. 

(b)  Draw  a  decision  tree  for  the  5  queens  problem.  How  many  solutions  are  there? 

(c)  How  many  solutions  are  there  to  the  6  queens  problem? 
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3.3.3.  Draw  a  decision  tree  for  covering  the  following  board  with  dominoes. 


3.3.4.  Draw  a  decision  tree  for  covering  a  3  x  3  boaxd  using  the  domino  and  the  L-shaped  3x3  square 


Notes  and  References 


Although  listing  is  a  natural  adjunct  to  enumeration,  it  has  not  received  as  much  attention  as 
enumeration  until  recently.  Listing,  ranking  and  unranking  have  emerged  as  important  topics  in 
connection  with  the  development  of  algorithms  in  computer  science.  These  are  now  active  research 
areas  in  combinatorics. 

The  text  [1]  discusses  decision  trees  from  a  more  elementary  viewpoint  and  includes  applications 
to  conditional  probability  calculations.  The  text  by  Stanton  and  White  [3]  is  at  the  level  of  this 
chapter.  Williamson  [5;  Chs.1,3]  follows  the  same  approach  as  we  do,  but  explores  the  subject  more 
deeply.  A  different  point  of  view  is  taken  by  Nijenhuis  and  Wilf  [2,4]. 
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CHAPTER  4 


Sieving  Methods 


Introduction 


A  "sieving  method"  is  a  technique  that  allows  us  to  count  or  list  some  things  indirectly.  After  a  few 
words  about  organization  and  difSculty,  we'll  introduce  the  two  sieving  methods  discussed  in  this 
chapter. 

•  The  sections  of  this  chapter  are  independent  of  each  other.  Thus,  if  your  instructor  assigns 
only  the  material  on  the  Principle  of  Inclusion  and  Exclusion,  you  need  not  read  the  sections 
on  structures  with  symmetries.  You  may  also  read  the  material  on  counting  structures  with 
symmetries  without  reading  the  material  on  listing  them. 

•  The  material  in  this  chapter  is  more  difficult  than  the  first  three  chapters  in  this  part.  Since  the 
material  here  is  not  needed  until  Part  IV,  it  may  be  postponed. 


Structures  Lacking  Things 


In  Section  4.1  we  look  at  the  problem  of  counting  structures  that  lack  certain  things;  e.g.,  lists  with 
no  repeated  elements  or  permutations  with  no  fixed  points.  Sometimes,  as  in  the  case  of  lists  with 
no  repeated  elements,  it  is  easy  to  count  the  structures  directly.  That  situation  is  not  of  interest 
here.  Instead,  we'll  examine  what  happens  when  it's  fairly  easy  to  count  structures  which  have 
some  of  the  properties  but  hard  to  count  those  which  have  none  of  the  properties.  For  example, 
consider  permutations  of  n  and  a  set  {Fi, . . . ,  Fn}  of  n  properties  where  Fi  is  the  property  that  the 
permutation  fixes  i;  that  is,  maps  i  to  i.  Suppose  our  problem  is  to  count  permutations  with  none  of 
the  properties;  that  is,  permutations  with  no  fixed  points.  This  is  hard.  However,  it  is  fairly  easy  to 
count  permutations  whose  fixed  points  include  some  specified  set  S;  that  is,  permutations  that  have 
at  least  some  the  properties  {Fi  \  i  <E  S}.  These  counts  can  be  used  to  indirectly  solve  the  original 
problem  by  using  the  "Principle  of  Inclusion  and  Exclusion." 

The  Principle  of  Inclusion  and  Exclusion  can  be  extended  in  various  ways.  We  briefiy  indicate 
some  of  these  at  the  end  of  Section  4.1. 
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Structures  with  Symmetries 


At  the  end  of  Example  1.12  (p.  13),  we  asked  how  many  ways  we  could  form  a  six  long  circular 
sequence  using  ones  and  twos  and  found  that  we  could  not  solve  it.  In  this  section  we'll  develop  the 
necessary  tools. 

The  circular  sequence  problem  is  difficult  because  "symmetries  induce  equivalences."  What  does 
this  mean?  The  sequence  121212  looks  the  same  if  it  is  circularly  shifted  two  positions.  This  is  a 
symmetry  of  the  sequence.  Several  sequences  correspond  to  the  same  circular  sequence  of  ones  and 
twos.  We  say  these  lists  arc  "equivalent."  Thus,  as  we  saw  in  Example  1.12,  the  three  sequences 
112112,  121121  and  211211  are  equivalent.  We  can  find  a  sequence  equivalent  to  a  given  one  by 
reading  the  given  sequence  "circularly:"  Start  reading  at  any  point.  At  the  end  of  the  sequence  jump 
to  the  start  and  continue  until  you  return  to  where  you  began  reading. 

To  list  the  circular  sequences,  we  need  a  list  C  of  sequences  such  that  every  sequence  is  equivalent 
to  exactly  one  sequence  in  C.  Thus,  exactly  one  of  the  sequences  112112,  121121  and  211211  would 
appear  in  C.  Counting  the  circular  sequences  means  finding  \C\.  We'll  discuss  listing  first  and  then 
counting. 

We  have  already  dealt  with  one  important  case  of  symmetries,  namely,  when  our  structures  are 
lists  and  we  arc  allowed  to  permute  the  items  in  the  list  in  any  fashion  whatsoever.  In  other  words, 
two  lists  are  the  same  if  they  can  be  made  identical  by  permuting  the  elements  in  one  of  them.  In 
fact,  this  case  is  so  important  that  it  has  a  name:  multisets.  (Remember  that  a  multiset  is  simply  a 
list  where  order  is  irrelevant.)  If  the  elements  of  the  multiset  can  be  ordered,  then  we  can  take  our 
representatives  C  to  be  a  collection  of  nondecreasing  functions.  This  was  discussed  in  Section  2.3. 

In  Section  4.2  we'll  look  at  the  problem  of  listing  structures  when  symmetries  are  present.  This 
is  much  like  the  nonmathematical  notion  of  a  sieve:  all  that  comes  through  the  sieve  are  "canonical" 
representations  of  the  structures.  Decision  trees  play  an  important  role. 

In  Section  4.3  we'll  look  at  the  problem  of  counting,  rather  than  listing,  these  structures.  "Burn- 
side's  Lemma"  provides  us  with  an  indirect  method  for  doing  this. 


4.1   The  Principle  of  Inclusion  and  Exclusion 


Imagine  that  a  professor  on  the  first  day  of  class  wants  to  obtain  information  on  the  course  back- 
ground of  the  students.  He  wants  to  know  what  number  of  students  have  had  Math  21,  what  number 

have  had  Comp  Sci  13  and  various  combinations  such  as  "Comp  Sci  13  but  not  Math  21."  For  some 
reason,  to  calculate  these  numbers  the  professor  asks  just  the  following  three  questions. 

"How  many  of  you  have  had  Math  21?" 

"How  many  of  you  have  had  Comp  Sci  13?" 

"How  many  of  you  have  had  Comp  Sci  13  and  Math  21?" 

Suppose  the  number  of  students  is  15,  12  and  8,  respectively. 

Can  the  professor  now  determine  answers  to  all  other  possible  questions  concerning  having  taken 
or  not  taken  these  courses?  Let's  look  at  a  couple  of  possibilities. 

How  many  have  had  Comp  Sci  13  but  not  Math  21?  Of  the  12  students  who  have  had  the  first 
course,  8  have  had  the  second  and  so  12  —  8  =  4  of  them  have  not  had  the  second. 

How  many  students  have  had  neither  course?  That  will  depend  on  the  total  number  of  students 
in  the  class.  Suppose  there  are  30  students  in  the  class.  We  might  think  that  30  —  15  —  12  =  3  of 
them  have  had  neither  course.  This  is  not  correct  because  the  students  who  had  both  courses  were 
subtracted  off  twice.  To  get  the  answer,  we  must  add  them  back  in  once.  The  result  is  that  there 
are  3-1-8  =  11  students  who  have  had  neither  course. 


4.1     The  Principle  of  Inclusion  and  Exclusion 
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We  can  rephrase  the  previous  discussion  in  terms  of  sets.  Let  S  be  the  set  of  students  in  the 
class,  ^1  the  subset  who  have  had  Math  21  and  5*2  the  subset  who  have  had  Comp  Sci  13.  The 
information  that  was  obtained  by  questioning  the  class  can  be  written  as 

|5|=30,    1511  =  15,    |52|  =  12    and  1510521=8, 

where  5i  fl  52  denotes  the  intersection  of  the  sets  5i  and  52-  We  saw  that  the  number  of  students 
who  had  neither  course  is  given  by 

|5|-(|5i|  +  |52|)  +  |5in52|.  4.1 

How  can  this  result  be  extended  to  more  than  two  classes?  The  answer  is  provided  by  the  following 
theorem.  After  stating  it,  we'll  see  how  it  can  be  applied  before  proving  it. 

Theorem  4.1   Principle  of  Inclusion  and  Exclusion     Let  81,82,   .,Sm  be  subsets  of  a 

set  8.  Let  Nq  =  \S\  and,  for  r  >  0,  let 

Nr  =  ^|5i,  n---n5ij,  4.2 

where  the  sum  is  over  all  r-long  strictly  increasing  sequences  chosen  from  m;  that  is,  {ii, . . .  ,ir} 
ranges  over  all  r-subsets  of  m.  The  number  of  elements  in  8  that  are  not  in  any  of  81, ... ,  Sm  is 

m 

J2{-iyNi  =  No-Ni+N2---  +  {-irNm.  4.3 

i=0 

When  m  =  2,  the  one  long  sequences  arc  1  and  2,  giving  A^i  =  |5i  |  + 152|.  The  only  two  long  sequence 
is  1,2  and  so  A'^2  =  |5i  n  52|.  Thus  (4.3)  reduces  to  (4.1)  in  this  case. 

As  we  saw  in  an  earlier  chapter,  strictly  increasing  sequences  are  equivalent  to  subsets.  Also, 
the  order  in  which  we  do  intersections  of  sets  does  not  matter.  (Just  as  the  order  of  addition  does 
not  matter  and  the  order  of  multiplication  does  not  matter.)  This  explains  why  we  could  have  said 
that  the  sum  defining  A^^  was  over  all  r-subsets  {ii, . . . ,  ir}  of  m. 

One  can  rewrite  (4.3)  in  a  somewhat  different  form.  To  begin  with,  the  Rule  of  Sum  tells  us  that 
|5|  equals  the  number  of  things  not  in  any  of  the  5i's  plus  the  number  of  things  that  are  in  at  least 
one  of  the  5j's.  The  latter  equals 

|5iU---U5„|. 
Using  (4.3)  and  noting  that  A^o  =  |5|,  we  have 

A^o  =  \8iU---U8m\  +  No-Ni+N2---  +  {-irNm. 

Rearranging  leads  to 

Corollary    With  the  same  notation  as  in  Theorem  4.1  (p.  95), 

m 

|5iU---U5„|  =  J2i-iy-'N,.  4.4 

i=l 

In  this  form,  the  Principle  of  Inclusion  and  Exclusion  can  be  viewed  as  an  extension  of  the  Rule  of 
Sum:  The  Rule  of  Sum  tells  us  that  if  T  =  5i  U  •  •  •  U  8m  and  if  each  structure  in  T  appears  in  exactly 
one  of  the  5^,  then 

|T|    =    \8i\  +  \82\  +  ...  +  \8m\. 

The  left  hand  side  of  this  equation  is  the  left  hand  side  of  (4.4).  The  right  hand  side  of  this  equation 
is  Ai,  the  first  term  on  the  right  hand  side  of  (4.4).  The  remaining  terms  on  the  right  hand  side  of 
(4.4)  can  be  thought  of  as  "corrections"  to  the  Rule  of  Sum  due  to  the  fact  that  elements  of  T  can 
appear  in  more  than  one  5i. 
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Figure  4.1     A  Venn  diagram  for  three  subsets  Si,  S2  and  S3. 


Example  4.1  Venn  diagrams  When  m  is  quite  small  in  Theorem  4.1,  it  is  possible  to  draw 
a  picture,  called  a  Venn  diagram  that  illustrates  the  theorem.  Figure  4.1  shows  such  a  diagram  for 
m  —  3.  The  interior  of  the  box  should  be  thought  of  as  containing  points  which  are  the  elements  of 
S.  (These  points  are  not  actually  shown  in  the  diagram.)  Similarly,  the  interior  of  the  circle  labeled 
Si  contains  the  elements  of  Si  and  its  exterior  contains  the  points  not  in  S'l .  Altogether,  the  three 
circles  for  S'l,  52  and  Ss  divide  the  box  into  eight  regions  which  we  have  numbered  0  through  7. 

In  the  figure,  region  7  corresponds  to  Si  D  S2  Ci  S^.  Region  0  corresponds  to  those  elements  of 
S  that  are  not  in  any  Si.  Region  3  corresponds  to  those  elements  of  S  that  are  in  Si  and  5*2  but 
not  in  53.  You  should  be  able  to  describe  all  of  the  other  regions  in  a  similar  manner.  The  elements 
of  5i  are  those  in  the  four  regions  numbered  1,  3,  5  and  7.  The  elements  of  5i  n  5,3  are  those  in 
regions  5  and  7.  You  should  be  able  to  describe  all  the  intersections  in  the  Principle  of  Inclusion 
and  Exclusion  in  this  manner.  You  can  then  determine  how  often  each  region  is  counted  in  and 
thereby  obtain  a  proof  of  (4.3)  for  m  =  3.  It  is  possible  to  generalize  this  argument  to  prove  (4.3), 
but  we  will  give  a  slightly  different  proof  of  (4.3)  later.  Q 

Example  4.2  Using  the  theorem  Many  of  Alice's  16  friends  are  athletic —they  cycle,  jog  or 
swim  on  a  regular  basis.  In  fact  we  know  that  6  of  them  cycle,  6  of  them  jog,  6  of  them  swim,  4  of 
them  cycle  and  jog,  2  of  them  cycle  and  swim,  3  of  them  jog  and  swim  and  2  of  them  engage  in  all 
three  activities.  How  many  of  Alice's  friends  do  none  of  these  things  on  a  regular  basis? 

Let  5  be  the  set  of  all  friends.  Si  the  set  that  cycle,  52  the  set  that  jog  and  5,3  the  set  that 
swim.  We  will  apply  (4.3)  with  m  =  3.  The  information  we  were  given  can  be  rewritten  as  follows: 

Nn  =  16  since  |5|  =  16 

iVi  =  18  since  |5i|=6    |52|  =  6    |53|  =  6; 

N2  =  9  since  |5in52|=4    |5in53|=2  |52n53|=3; 

Ns  =  2  since  |5i  n  52  n  Ssl  =  2. 

Thus  the  answer  to  our  question  is  that  16  —  18  +  9  —  2  =  5  of  her  friends  neither  cycle  nor  jog  nor 
swim  regularly.  Q 

At  this  point  you  may  well  object  that  this  method  is  worse  than  useless  because  there  are  much 
easier  ways  to  get  the  answer.  For  example,  to  find  out  how  many  students  took  neither  Math  21 
nor  Comp  Sci  13,  it  would  be  easier  to  simply  ask  "How  many  of  you  have  had  neither  Math  21 
nor  Comp  Sci  13?"  This  is  true.  So  far  we've  just  been  getting  familiar  with  what  the  Principle  of 
Inclusion  and  Exclusion  means.  We  now  turn  to  some  examples  where  it  is  useful. 


4.1     The  Principle  of  Inclusion  and  Exclusion 
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Example  4.3    Counting  surjections    How  many  surjections  are  there  from  n  to  fc? 

This  problem  is  closely  related  to  S{n,k),  the  Stirling  numbers  of  the  second  kind,  which  we 
studied  previously  but  couldn't  find  a  formula  for.  In  fact,  a  surjection  /  defines  a  partition  of  the 
domain  into  k  blocks  where  the  ith  block  is  f~^{i).  Since  the  blocks  are  all  distinct  and  S{n,  k)  does 
not  care  about  the  order  of  the  blocks,  the  number  of  surjections  is  k\S(n,  k).  Our  attention  will  be 
devoted  to  the  surjections — we  don't  need  S{n,  k)  here — but  we  pointed  out  the  connection  because 
it  will  allow  us  to  get  a  formula  for  S{n,  k),  too. 

Let  S  be  the  set  of  all  functions  from  ntok  and  let  St  be  the  set  of  those  functions  that  never 
take  on  the  value  i.  In  this  notation,  the  set  of  surjections  is  the  subset  of  S  that  does  not  belong 
to  any  of  Si, . . .  ,Sk  because  a  surjection  takes  on  all  values  in  its  range.  This  suggests  that  we  use 
(4.3)  with  m  —  k. 

We  found  long  ago  that  15*1  =  /e".  It  is  equally  easy  to  find  \Si^  C\  ■  ■  ■  C\  Si^\:  The  set  whose 
cardinality  we  are  taking  is  just  all  functions  from  n  to  fc  —  {ii, . . . ,  v}  and  so  equals  (fc  —  r)".  This 
tells  us  that  each  of  the  terms  in  the  sum  (4.2)  defining  A^^  equals  {k  —  r)".  Consequently,  is 
(fc  —  r)"  times  the  number  of  terms.  Since  there  (^)  subsets  of  k  of  size  r,  the  sum  contains  (J;)  terms 
and  Nr  =  (^)(A;  —  r)".  It  follows  from  (4.3)  that  the  number  of  surjections  is 


Because  of  the  possibility  of  considerable  cancellation  due  to  alternating  signs,  numerical  evaluation 
of  this  expression  for  large  values  of  n  and  k  can  be  awkward.  Q 

In  learning  to  apply  the  Principle  of  Inclusion  and  Exclusion,  it  can  be  difficult  to  decide  what 
the  sets  S,Si,. . .  should  be.  It  is  often  helpful  to  think  in  terms  of 

•  a  larger  problem  S  that  is  easier  to  solve  and 

•  conditions  Cj  that  all  do  NOT  hold  for  precisely  those  structures  in  S  that  are  solutions  of  the 
original  problem. 

What's  the  connection  between  all  this  and  the  sets  in  Theorem  4.1?  The  set  Si  is  the  set  of  structures 
in  the  larger  problem  S  that  satisfy  Cj.  Note  that  NOT  appears  in  our  description  because  (4.3) 
counts  those  elements  of  S  that  are  NOT  in  any  of  the  Si's. 

Let's  look  at  the  previous  example  in  these  terms.  Our  larger  problem  is  the  set  of  all  functions 
for  n  to  k;  that  is,  S  =  k—.  Since  we  want  those  functions  that  do  NOT  omit  any  of  the  values 
1, . . . ,  fc,  we  take  (7,  to  be  the  condition  that  f  ■        k  omits  the  value  i;  that  is,  i  ^  Image(/). 

Sometimes  people  talk  about  properties  instead  of  conditions.  In  this  case,  they  speak  of  "having 
a  property"  instead  of  "satisfying  a  condition." 


4.5 


Combining  this  with  the  remarks  at  the  start  of  this  example,  we  have 


4.6 
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Example  4.4  Counting  solutions  to  equations  How  many  different  solutions  are  there  to 
the  equation 

Xi  +  X2  +  Xy,  +  X4  +  .T5  =  n,  4.7 

where  the  x^'s  must  be  positive  integers,  none  of  which  exceeds  fc? 

Since  it  is  easier  to  solve  the  equations  without  the  constraint  that  the  Xj's  not  exceed  k,  we'll 
use  Theorem  4.1  as  follows: 


•  Let  S  be  the  set  of  all  positive  integer  solutions  (xi, . . . ,  2:5)  of  the  equation. 

•  Let  the  ith  condition  be  a;,  >  A:  for  z  =  1,  2,  3,  4  and  5.  The  answer  to  the  original  problem  is 
the  number  of  elements  of  S  (i.e.,  positive  integer  solutions)  that  satisfy  none  of  the  conditions. 

For  this  to  be  useful,  we  must  be  able  to  easily  determine,  for  example,  how  many  solutions  to  (4.7) 

have  xi  >  k  and  xs  >  k.  This  is  simply  the  number  of  solutions  to  yi  H  h  2/5  =  n  —  2k  because 

we  can  take  xi  =  yi  +  k,  xs  =  ys  +  k  and  xj  =  yj  for  j  =  2,  4  and  5. 

We  are  ready  to  apply  (4.3)  with  m  —  5.  To  begin  with,  what  is  \S\7  A  solution  in  5*  can  be 
obtained  by  inserting  commas  and  plus  signs  in  the  n  —  1  spaces  between  the  n  ones  in  (1  1  1 ...  1) 
in  such  a  way  that  either  a  plus  or  a  comma,  but  not  both,  is  inserted  in  each  space  and  exactly 
4  commas  are  used.  Thus  |5|  =  ("^  ) .  By  this  and  the  end  of  the  previous  paragraph,  it  follows  that 

^1        /n  —  kr  —  1\ 

\Si,n...nSi^\  =  i      ^     j,  4.8 

where  the  binomial  coefficient  is  taken  to  be  zero  if  n  —  fcr  —  1  <  4.  Since  there  are  (^)  choices  for 
the  set  {ii, . . .        the  number  of  solutions  to  (4.7)  is 

n-l\     f5\fn-k-  1\     /5\  fn-2k-  1 
4       -    1  4         +    2  4 


n  —  3fc  —  1\      /5\  /n  —  4A;  —  1 

+ 


0 

37  V    4    y  ■  V4y  V  4 

5\/n-5k-l 
5  A  4 

This  formula  is  a  bit  tricky.  If  we  blindly  replace  the  binomial  coefficients  using  the  falling  factorial 
formula 

''rn\        m{m  —  l){m  —  2){m  —  3) 


4  /  24 

and  use  algebra  to  simplify  the  result,  we  will  discover  that  the  number  of  solutions  is  zero!  How 
can  this  be?  The  definition  of  binomial  coefficient  that  we  used  for  (4.8)  gives  (™)  =  0  when  m  <  0, 
which  does  not  agree  with  the  falling  factorial  formula  (4.9).  Thus  (4.9)  cannot  be  used  when  m  <  0. 
The  problem  that  we  have  been  considering  can  be  interpreted  in  other  ways: 

•  How  many  compositions  of  n  are  there  that  consist  of  five  parts,  none  of  which  exceed  k? 

•  How  many  ways  can  n  unlabeled  balls  be  placed  into  five  labeled  boxes  so  that  no  box  has 
more  than  k  balls? 

You  should  easily  be  able  to  see  that  these  problems  are  all  equivalent.  Q 
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Finally,  the  proof  of  Theorem  4.1: 

Proof:  Suppose  that  s  G  S.  Let  X  C  khe  such  that  x  £  X  ii  and  only  if  s  G  Sx',  that  is,  X  is  the 
set  consisting  of  the  indices  of  those  Si's  that  contain  s.  How  much  does  s  contribute  to  the  sum  in 
(4.3)?  For  the  theorem  to  be  true,  it  must  contribute  1  if  X  =  0  and  0  otherwise. 
Clearly  s  contributes  1  to  the  sum  when  X  =  0,  but  what  happens  when  X  ^$1 
To  begin  with,  what  docs  s  contribute  to  A^,.  when  r  >  0?  It  contributes  nothing  to  some  terms 
and  contributes  1  to  those  terms  in  Nr  for  which  s  €  fl  •  •  •  Pi  Sj^.  By  the  definition  of  X,  this 
happens  if  and  only  if  C  X  Thus  s  contributes  to  precisely  those  terms  of  A^^  that 

correspond  to  subsets  of  X.  Since  there  are  ('^')  such  terms,  s  contributes  (''^')  to  N^.  This  is  0  if 
r  >  \X\.  Thus  the  contribution  of  s  to  (4.3)  is 

r=l  ^       ^  r=0  ^  ' 

By  the  binomial  theorem,  this  sum  is  (1  —  l)'''^!  =  O'"''^!,  which  is  zero  when  \X\  >  0. 
A  different  proof  using  "characteristic  functions"  is  given  in  Exercise  4.1.10.  Q 


Example  4.5  Derangements  Recall  that  a  derangement  of  n  is  a  permutation  /  such  that 
f{x)  =  X  has  no  solutions;  i.e.,  the  permutation  has  no  cycles  of  length  1.  A  cycle  of  length  1  is  also 
called  a  "fixed  point."  Let  D„  be  the  number  of  derangements  of  n.  What  is  the  value  of  _D„? 

Let  the  set  S  of  objects  be  all  permutations  of  n  and,  for  1  <  i  <  n,  let  Si  be  those  permutations 
having  i  as  a  fixed  point.  In  other  words,  the  larger  problem  is  counting  all  permutations.  If  <t  is  a 
permutation,  condition  Ci  states  that  a(i)  =  i. 

The  set  Si-^  n  •  •  •  n  5*^,,  consists  of  those  permutations  for  which  the  r  elements  of  /  =  . . . ,  v} 
are  fixed  points.  Since  such  permutations  can  be  thought  of  as  permutations  of  n  —  /,  there  are 
(n  -  r)!  of  them.  Thus  Nr  =  (")(n  -  r)!  =  n!/r!.  By  (4.3), 

Dn  =  n\y^-L.  4.10 

2  =  0 

We  will  use  the  following  theorem  from  calculus  to  obtain  a  simple  approximation  to  Dri' 


Theorem  4.2  Alternating  series  Suppose  that  \bo\  >  |&i|  >  I62I  >  •  • 
bk  alternate  in  sign  and  that  lim/;_>oo  6fe  =  0.  Then  ^2^=0     converges  and 


that  the  values  of 


<  \h 


'n+l| 


fe=0 


fe=0 


The  terms  in  (4.10)  alternate  in  sign  and  decrease  in  magnitude.  Since 

^  (-l)fe  ^  1 

fe=0 

differs  from  n!/e  by  at  mos 

integer  to  n!/e.  Q 


fe=o 

it  follows  that  Z)„  differs  from  n!/e  by  at  most  =  Hence,  for  n  >  1,  Z)„  is  the  closest 
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Exercises 


4.1.1.  Let  us  define  a  "typical  four  letter  word"  to  be  a  string  of  four  letters  L1L2L3L4  where  Li  and  L4 
are  consonants  and  at  least  one  of  L2  and  L3  is  a  vowel. 

(a)  For  i  —  2  and  i  —  3,  lot  Vi  be  the  sot  of  scqucncos  L1L2L3L4  whoro  Li  and  L4  arc  consonants, 
Li  is  a  vowel  and  the  remaining  letter  is  arbitrary.  Draw  a  Venn  diagram  for  the  two  sets  V2 
and  V3.  Indicate  what  part  of  the  diagram  corresponds  to  typical  four  letter  words  and  calculate 
the  number  of  such  words  by  using  the  Rule  of  Product  and  the  Principle  of  Inclusion  and 
Exclusion. 

(b)  For  i  —  2  and  i  =  3,  let  C'i  be  the  set  of  sequences  L1L2L3L4  where  Li,  Lj  and  L4  are  consonants 
and  the  remaining  letter  is  arbitrary.  Draw  a  Venn  diagram  for  the  two  sets  C2  and  C3.  Indicate 
what  part  of  the  diagram  corresponds  to  typical  four  letter  words  and  calculate  the  number  of 
such  words. 

4.1.2.  How  many  ways  can  n  married  couples  be  paired  up  to  form  n  couples  so  that  each  couple  consists 
of  a  man  and  a  woman  and  so  that  no  couple  is  one  of  the  original  married  couples? 

4.1.3.  Charted  Dodgson  (Lewis  Carroll)  Speaks  of  a  battle  among  100  combatants  in  which  80  lost  an  arm, 
85  a  leg,  70  an  eye  and  75  an  ear.  (Yes,  it's  gruesome,  but  that's  the  way  he  stated  it.)  Some  number 
p  of  people  lost  all  four. 

(a)  It  is  possible  that  p  could  be  as  large  as  70?  Why? 
*(b)  Find  a  lower  bound  for  p  and  explain  how  your  lower  bound  could  actually  be  achieved. 
Hint.  A  key  to  getting  a  lower  bound  is  to  realize  that  there  arc  only  100  people. 

4.1  A.  How  many  ways  can  we  make  an  n-card  hand  that  contains  at  least  one  card  from  each  each  of  the 
4  suits? 

Hint.  Let  a  property  of  a  hand  be  the  absence  of  a  suit. 

4.1.5.  Let  fiN)  be  the  number  of  integers  between  1  and  N  inclusive  that  have  no  factors  in  common 
with  N.  Thus  (p{l)  =  (fi{2)  =  1,  (p(3)  =  (p{4)  =  ip{6)  =  2  and  ip{5)  =  4.  (p  is  called  the  Euler  phi 
function.  Let  pi, .  .  .  ,pn  be  the  primes  that  divide  A'^.  For  example,  when  N  —  300,  the  list  of  primes 
is  2, 3,  5.  Let  Sj  be  the  set  of  a;  €  iV  such  that  a;  is  a  divisible  by  pj,  or,  equivalently,  the  jth  property 
is  that  Pj  divides  the  number. 

(a)  Prove  that  (4.3)  determines  ^{N). 

(b)  Prove 


\Si^  n  ■  ■  ■  n  5i, 


N 


Pii  ■■■Pi. 


n 

(c)  Use  this  to  prove  (p{N)  =   ) 


4.1.6.  Call  an  n  X  n  matrix  A  of  zeroes  and  ones  bad  if  there  is  an  index  k  such  that  a^,  ^  =  a,  j.  =  0  for 
1  <  i  <  n.  In  other  words,  the  row  and  column  passing  through  (fe,  k)  consist  entirely  of  zeroes.  Let 
g{n)  be  the  number  of  n  x  n  matrices  of  zeroes  and  ones  which  are  not  bad. 

(a)  For  any  subset  K  of  n,  lot  z{K)  be  the  number  of  n  x  n  matrices  A  of  zeroes  and  ones  such  that 
aij  =  0  if  either  i  or  j  or  both  belong  to  K.  Explain  why  z{K)  depends  only  on  \K\  and  obtain 
a  simple  formula  for  z{K).  Call  it      where  k  =  \K\. 

(b)  Express  g(n)  as  a  fairly  simple  sum  in  terms  of  z^. 

4.1.7.  Let  C  be  the  multiset  {ci,  ci ,  C2,  C2, . .  .  ,Crn,Crn}  containing  two  copies  each  of  m  distinct  symbols. 
How  many  ways  can  the  elements  of  5*  be  arranged  in  an  ordered  list  so  that  adjacent  symbols  are 
distinct. 

Hint.  A  list  in  which  Cj  and  Cj  are  adjacent  can  be  thought  of  as  a  list  made  from  the  multiset 

CU{cf}-{ci,Ci}, 
where  cf  is  a  new  symbol  that  stands  for  CjCj  in  the  list. 
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*4.1.8.  This  is  the  same  as  the  previous  exercise,  except  that  now  each  of  ci  through  Cm  appears  in  C  three 
times  instead  of  twice.  The  constraint  is  still  the  same:  Adjacent  symbols  must  be  distinct. 
Hint.  There  are  now  two  types  of  properties,  namely  CjCj  appearing  in  the  list  and  CiCiCi  appearing 
in  the  list.  Call  the  corresponding  sets  Sf  and      .  In  computing  Nr  you  need  to  consider  how  many 
times  you  require  an       and  how  many  times  you  require  an       as  well  as  S^. 

4.1.9.  Let  {S,  Pr)  be  a  probability  space  and  let  Si, ... ,  Sm  be  subsets  of  S.  Prove  that 

m 

Pr((5i  U  •  •  •  U  SmT)  =  Y^{-lfNi    where    iVr  =        ^^^^'i  n  •  •  •  n  5^ J, 

the  sum  ranging  over  all  r-long  strictly  increasing  sequences  chosen  from  rn. 

4.1.10.  The  goal  of  this  exercise  is  to  use  "characteristic  functions"  to  prove  Theorem  4.1  (p.  95).  Let 
Xi  :  5  — »  {0, 1}  be  the  characteristic  function  of  Sf,  that  is, 

to  a  si  Si. 

m 

(a)  Explain  why  the  number  we  want  in  the  theorem  is  ~  Xj(*))- 

seSi=i 

(b)  Prove  that 

m 

i[{i-xiis))  =  ^(-i)i'in»(«)- 

i=l  IC.rn  iGl 

(c)  Complete  the  proof  of  Theorem  4.1. 

4.1.11.  We  want  to  count  the  number  of  elements  in  exactly  k  of  the  Si.  Let  K"^  be  the  complement  of  K 

relative  to  m;  that  is,  K'^  —  m\K. 

(a)  Explain  why  the  number  we  want  is 

seS  KQm  \eK  ^  \eK<: 

\K\=k 


(b)  Show  that  this  expression  is 


E  E  E  (-i)'^i  n  ^«(^)- 

seS  KQm  JQJ"  ieJUK 
\K\=k 


(c)  Show  that  this  equals 


EE  E  (-D'^'-'^fnx.c^))  =  EE(-i)'''-'^(nx»(^))(' 

seS  LCm  KCL  \eL  '  seS  LCm  \eL  '  ^ 

\K\=h 


(d)  Conclude  that  the  number  of  elements  in  S  that  belong  to  exactly  k  of  the  Si  is 

El-')*-'©"- 
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*Bonferroni's  Inequalities 


We  conclude  this  section  by  looking  briefly  at  two  more  advanced  topics  related  to  the  Principle  of 
Inclusion  and  Exclusion:  Bonferroni's  Inequalities  and  partially  ordered  sets. 

Theroem  4.1  can  sometimes  be  a  bit  of  a  problem  to  use  even  after  we've  formulated  our  problem 
and  know  exactly  what  we  must  count.  There  arc  two  reasons  for  this.  First,  there  will  be  a  lot  of 
addition  and  subtraction  to  do  if  m  is  large.  Second,  it  may  be  difficult  to  actually  compute  values 
of  Nr  so  we  may  have  to  be  content  with  estimating  them  for  small  values  of  r  and  ignoring  them 
when  r  is  large.  Because  of  these  problems,  we  may  prefer  to  obtain  a  quick  approximation  to  (4.3). 
A  method  that  is  frequently  useful  for  doing  this  is  provided  by  the  following  theorem. 

Theorem  4.3  Bonferroni's  inequalities  Let  the  notation  he  the  same  as  in  Theorem  4.1 
(p.  95)  and  let  E  he  the  number  of  elements  of  S  not  in  any  of  the  Si.  Then 

t-i 

i.e.,  truncating  the  sum  gives  an  error  which  is  no  larger  than  the  first  term  that  was  neglected. 
Furthermore,  the  sum  is  either  an  overestimate  or  an  underestimate  according  as  t  is  odd  or 
even,  respectively. 

We  can't  prove  this  simply  by  appealing  to  Theorem  4.2  (p.  99)  because  the  terms  may  be  increasing 
in  size.  The  proof  of  Bonferroni's  Inequalities  is  left  as  an  exercise. 

Example  4.6  Using  the  theorem  Let  r{n,k)  be  the  fraction  of  those  functions  in  k-  which 
are  surjections.  Using  Bonferroni's  inequalities  and  the  ideas  in  Example  4.3  (p.  97),  we'll  estimate 
r(n,  k). 

Let's  begin  with  t  =  2  in  the  theorem.  In  that  case  we  simply  need  to  divide  the  i  =  0  and  i  =  1 
terms  in  (4.5)  by  the  total  number  of  functions.  Thus 

r{n,k)  >   ^  L  =  l-k{l-l/kr. 

With  k  =  10  and  n  =  40,  we  see  that  at  least  85.2%  of  the  functions  in  10—  are  surjections. 
If  we  set  t  =  3  in  Bonferroni's  inequalities,  we  obtain  the  upper  bound 

With  A;  =  10  and  n  =  40,  we  see  that  at  most  85.8%  of  the  functions  in  10^  are  surjections.  Q 
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*Partially  Ordered  Sets 


There  is  an  important  generalization  of  the  Principle  of  Inclusion  and  Exclusion  (Theorem  4.1  (p.  95)) 
which  we'll  just  touch  on.  It  requires  some  new  concepts. 

A  binary  relation  p  on  a  set  5  is  a  subset  of  S  x  S.  Instead  of  writing  (x,  y)  G  p,  people  write 
xpy.  For  example,  if  5  is  a  set  of  integers,  then  we  can  let  p  be  the  set  of  all  x.y  G  S  with  x  less 
than  y.  Thus,  xpy  if  and  only  if  x  is  less  than  y.  People  usually  use  the  notation  <  for  this  binary 
relation.  As  another  example,  we  can  let  S  be  the  set  of  all  subsets  of  n  and  let  C  be  the  binary 
relation. 

We  can  describe  equivalence  relations  as  binary  relations:  Let  ~  be  an  equivalence  relation  on 
S.  Those  pairs  [x,  y)  for  which  x  ~  y  form  a  subset  of  S  x  S. 
We  now  define  another  important  binary  relation. 

Definition  4.1    Partially  Ordered  Set    A  set  P  and  a  binary  relation  p  satisfying 

(P-1)  xpx  for  all  X  G  P; 

(P-2)  if  xpy  and  ypx,  then  x  —  y;  and 

(P-3)  if  xpy  and  ypz,  then  xpz 

is  called  a  partially  ordered  set,  also  called  a  poset.  The  binary  relation  p  is  called  a  partial 
order. 

The  real  numbers  with  xpy  meaning  "a;  is  less  than  or  equal  to  y"  is  a  poset.  The  subsets  of  a  set 
with  xpy  meaning  x  C  y  is  a  poset.  Because  of  these  examples,  people  often  use  the  symbol  <  or 
the  symbol  C  in  place  of  p,  even  when  the  partial  order  does  not  involve  numbers  or  subsets. 

We  now  return  to  Theorem  4.1  and  begin  by  rewriting  the  terms  in  (4.2)  as  functions  of  sets: 
/({«i, . . . ,  v})  =  \Sii  n  •  •  •  n  ^i^l.  How  should  we  define  /(0)?  It  should  be  the  size  of  the  empty 
intersection.  In  most  situations,  the  best  choice  for  the  empty  intersection  is  everything.  Thus  we 
should  probably  take  /(0)  =  \S\,  the  size  of  the  set  that  contains  everything. 

Many  people  find  this  choice  for  the  empty  intersection  confusing,  so  we  digress  briefly  to  explain 
it.  (If  it  does  not  confuse  you,  skip  to  the  next  paragraph.)  Let's  look  at  something  a  bit  more 
familiar — summations.  As  you  know,  the  value  of 

g{A)  =  4.11 

is  defined  to  be  the  sum  of  f{i)  over  all  i  €  A.  You  should  easily  see  that  if  A  and  B  are  disjoint 
nonempty  sets,  then 

g{AuB)  =  g{A)+g{B).  4.12 
If  we  want  this  to  be  true  for  B  =  0,  we  must  have 

g{A)+g{(d)  =  giAUiD)  =  g{A),  4.13 

and  so  g{$)  must  equal  0,  the  identity  element  for  addition.  Suppose  we  replace  the  sum  in  (4.11)  with 
a  product.  Then  (4.12)  becomes  g(A  U  B)  =  g{A)g{B)  and  the  parallel  to  (4.13)  gives  g{A)g{%)  = 
g{A).  Thus  g{A)  should  be  1,  the  identity  for  multiplication.  Instead  of  g  and  h  being  numerically 
valued  functions,  they  could  be  set  valued  functions  and  we  could  replace  the  summation  in  (4.11) 
with  either  a  set  union  or  a  set  intersection.  (In  terms  of  the  previous  notation,  we  would  write 
Ai  instead  of  h{i).)  Then  g(0)  would  be  taken  to  be  the  identity  for  set  union  or  set  intersection 
respectively;  that  is,  either  ^(^1)  U  g(0)  =  g{A)  or  g{A)  n  ^(0)  =  g{A).  This  leads  to  ^(0)  =  0  and 
5(0)  =  S,  respectively. 


104       Chapter  4    Sieving  Methods 


Let's  recap  where  we  were  before  our  digression:  We  defined 


for  ^  ^  0  and  /(0)  =  \S\.  Now,  A''^  is  simply  the  sum  of  f{R)  over  all  subsets  i?  of  to  of  size  r;  i.e., 


RCm 
\R\=r 


We  can  rewrite  (4.3)  in  the  form 


RCm 


In  words,  we  can  describe  •  ■  • ,  ir})  as  the  number  of  things  that  satisfy  conditions  ii, . . .  ,ir  and 

possibly  others.  In  a  similar  manner,  we  could  define  a  function  e({ii, . . . ,  ir})  to  the  number  of  things 
that  satisfy  condition  ii, . . .  ,ir  and  none  of  the  other  conditions  in  the  collection  1, . . . ,  to.  In  these 
terms,  (4.3)  is  a  formula  for  e(0).  Also,  by  the  definitions  of  e  and  /,  we  have  f{R)  =  J2qdr  e(Q)- 
We  state  without  proof  a  generalization  of  the  Principle  of  Inclusion  and  Exclusion. 

Theorem  4.4  Let  P  be  the  partially  ordered  set  of  subsets  ofk.  For  any  two  functions  e  and 
f  with  domain  P 

f[x)  =  ^e{y)  foraUxeP  4.14 

j/Dx 


if  and  only  if 


e{x)  =  E("^)'^'"''''-^(2/)  for  all  xe  P. 

yDx 


4.15 


This  result  can  be  extended  to  any  finite  partially  partially  ordered  set  P  if  (—1)1^1-1^1  is  replaced 
by  a  function  /x(x,  y),  called  the  Mobius  function  of  the  partially  ordered  set  P.  We  will  not  explore 
this. 


Exercises 


4.1.12.    This  exercise  extends  the  Principle  of  Inclusion  and  Exclusion  (4.3).  Let  E/.  be  the  number  of 
elements  of  S  that  lie  in  exactly  k  of  the  sets  Si,  S2,  ■  ■  ■ ,  Sm-  Prove  that 


i=0  ^ 


4.1.13.  The  purpose  of  this  exorcise  is  to  prove  Bonfcrroni's  inequalities. 

(a)  Prove  the  inequalities  are  equivalent  to  the  statement  that  sums 


ct{X)  = 


alternate  in  sign  until  eventually  becoming  0  for  t  > 

(b)  Prove  ct(X)  =  (-l)*(''^i~^)  and  so  complete  the  proof. 

4.1.14.  Find  a  formula  that  bears  the  same  relation  to  Bonfcrroni's  inequalities  that  (4.4)  bears  to  (4.3);  i.e., 
find  inequalities  for  approximations  to  l/Si  U  -  •  -USml  rather  than  for  approximations  to  E. 
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4.1.15.  Consider  the  following  algorithm  due  to  H.  Wilf. 

Initialize:  Let  the  Ni  be  defined  as  in  (4.3). 
Loop:  Execute  the  following  code. 

For  j  =  0, 1, . . .  ,m  —  1 

For  i  =  m  —  l,m  —  2, . . .  ,j 
Ni  =  Ni-  Ni+i 

End  for 
End  for 

The  loop  on  i  means  that  i  starts  at  m  —  1  and  decreases  to  j. 

(a)  By  carrying  out  the  algorithm  for  m  =  2  and  m  =  3,  prove  that  iVj  is  replaced  by  E^,  where 
is  given  by  Exercise  4.1.12  for  these  values  of  m. 

(b)  We  can  rephrase  the  algorithm  in  a  set  theoretic  form.  Replace  Nr  by  N* ,  the  multiset  which 
contains  each  s  €  S  as  many  times  as  there  are  1  <  ii  <  12  <  ■  ■  ■  <  ir  <i  rn  such  that 

s  e  Si-,n...nSi^.  4.17 

By  the  definition  of  Nr,  it  follows  immediately  that  Nr  =  \Nr\.  Similarly,  replace  Sj,  by  E^,,  the 
set  of  those  elements  of  S  that  belong  to  exactly  k  of  the  sets  Si, ... ,  Sm.  Replace  Ni  by  A''*  in 
the  algorithm  and  interpret  N*  —  N*^^  to  be  the  multiset  that  contains  s  €  5  as  many  times  as 
it  appears  in  N^  minus  the  number  of  times  it  appears  in  N*_^i.  We  claim  that  the  algorithm 
now  stops  with  N*  replaced  by  E*.  Prove  this  for  m  =  2  and  m  =  3. 

*(c)  Using  induction  on  t,  prove  the  set  theoretic  form  of  the  algorithm  by  proving  that  after  t 
iterations  of  the  loop  on  j;  i.e.,  j  =  0, . . .  ,t  —  1  the  following  is  true.  If  s  £  S  appears  in  exactly 
p  of  the  sets  Si,. . . ,  Sm,  then  it  appears  in  N*  with  multiplicity 

H{p,r,t)  =  <  1,  ii  t  >  p  and  r  =  p:  4.18 

[^0,  it  t  >  p  and  r  7^  p. 

Also  prove  that  no  s  £  S  ever  appears  more  times  in  an  A^^+i  than  it  does  in  an  A'^  when  we 
are  calculating  N*  —  N*+i  ■ 

(d)  Prove  that  the  validity  of  the  set  theoretic  form  of  the  algorithm  implies  the  validity  of  the 
numerical  form  of  the  algorithm. 
Hint.  Use  the  last  sentence  in  (c). 

4.1.16.  Let  Dn{k)  be  the  number  of  permutations  of  n  that  have  exactly  k  fixed  points.  Thus  Dn{0)  =  Dn, 
the  number  of  derangements  of  n. 

(a)  Use  Exercise  4.1.12  to  obtain  a  formula  for  Dn{k). 

(b)  Give  a  simple,  direct  combinatorial  proof  that  Dn{k)  =  (^)_D„_i.. 

(c)  Using  algebra  and  (4.10),  prove  the  answers  in  (a)  and  (b)  arc  equal. 

4.1.17.  Let  A  =  {ai, . . .  ,am}  be  a  set  of  m  integers,  all  greater  than  1.  Let  d{n,k,A)  be  the  number  of 
integers  in  n  that  are  divisible  by  exactly  k  of  the  integers  in  A. 

(a)  Assuming  that  the  elements  of  A  are  distinct  primes  all  dividing  n,  obtain  a  formula  for  d{n,  k,  A) 
by  using  Exercise  4.1.12.  Specialize  this  formula  to  obtain  a  formula  for  the  Euler  phi  function 
ip{n)  discussed  in  Exercise  4.1.5. 

(b)  Relax  the  constraints  in  (a)  by  replacing  the  assumption  that  the  elements  in  A  are  primes  by 
the  assumption  that  no  two  elements  in  A  have  a  common  factor. 

(c)  Relax  the  constraints  in  (a)  further  by  not  requiring  that  the  elements  of  A  divide  n. 

(d)  Can  you  relax  the  constraints  in  (a)  still  further  by  making  no  assumptions  about  A  and  n 
except  that  A  consists  of  m  integers  greater  than  1? 

4.1.18.  Explain  why  the  real  numbers  with  xpy  meaning  "x  is  less  than  y"  is  not  a  poset. 
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4.1.19.  Prove  that  the  following  are  posets. 

(a)  The  real  numbers  with  xpy  meaning  "a:  is  less  than  or  equal  to  y." 

(b)  The  real  numbers  with  xpy  meaning  "x  is  greater  than  or  equal  to  y." 

(c)  The  subsets  of  a  set  with  xpy  meaning  x  Cy. 

(d)  The  positive  integers  with  xpy  meaning  y/x  is  an  integer. 

4.1.20.  Prove  that  if  {S,p)  is  a  poset  then  so  is  {S,t)  where  xry  if  and  only  if  ypx. 

4.1.21.  Let  S  be  the  set  of  all  partitions  of  n.  If  x,y  G  S,  write  xpy  if  and  only  if  every  block  of  j/  is  a 
union  of  one  or  more  blocks  of  x.  For  example,  {1,  2},  {3},  {4}}p{{l,  2,  4},  {3}}.  Prove  that  this  is  a 
poset. 

4.1.22.  Suppose  that  {R,  p)  and  (T,  r)  are  posets.  Prove  that  {R  x  T,  tt)  is  a  poset  if  (r,  s)7r{r',  s')  means  that 
both  rpr'  and  trt'  are  true. 

4.1.23.  We  will  deduce  the  result  in  Exercise  4.1.12  as  a  consequence  of  the  partially  ordered  set  extension 
of  the  Principle  of  Inclusion  and  Exclusion. 

(a)  We  look  at  subsets  j/  of  {1,  2, ... ,  m}.  Let  e(y)  be  the  number  of  elements  in  S  that  belong  to 
every  Si  for  which  i  €  y  and  to  none  of  the  Sj  for  which  j  ^  y.  Prove  that  £fc  is  the  sum  of  e(y) 
over  all  y  of  size  k. 

(b)  Prove  that  if  f{x)  is  defined  by  (4.14),  then 


fix)  = 

(c)  Conclude  that  (4.15)  implies  (4.16). 

4.2   Listing  Structures  with  Symmetries 


By  using  decision  trees,  introduced  in  Chapter  3,  we  can  produce  our  list  C  of  canonical  represen- 
tatives. There  are  many  ways  to  go  about  it.  We'll  illustrate  this  by  some  examples.  Many  of  the 
examples  are  based  on  the  Ferris  wheel  problem  of  Example  1.12  (p.  13):  How  many  distinct  six  long 
circular  sequences  of  ones  and  twos  are  there? 


Example  4.7  A  straightforward  method  One  approach  to  the  Ferris  wheel  problem  is  to 
simply  generate  all  sequences  and  reject  those  that  are  equivalent  to  an  earlier  one  in  the  lex  order. 
For  example,  we  would  reject  both  121121  and  211211  because  they  are  equivalent  to  112112,  which 
occurs  earlier  in  lex  order. 

We  can  reduce  the  size  of  the  decision  tree  by  being  careful;  e.g.,  the  sequence  that  starts  1211 . . . 
can  never  be  lexically  least  because  we  could  shift  it  two  positions  to  get  11 ...  12. 

Even  with  these  ideas,  the  decision  tree  is  rather  large.  Hence,  we'll  shorten  the  problem  we've 
been  considering  to  sequences  of  length  4.  The  decision  tree  is  shown  in  Figure  4.2.  It  is  simply  the 
tree  for  generating  all  functions  from  4  to  2  with  those  functions  which  have  a  (lexically)  smaller 
circular  shift  removed.  How  did  we  do  the  removal?  When  we  decided  to  begin  with  2,  there  was  no 
possibility  of  ever  choosing  a  1 — a  circular  shift  would  begin  with  1  and  so  be  smaller.  Also,  if  any 
2's  are  present,  we  can  never  end  with  a  1  because  a  circular  shift  that  moved  it  to  the  front  would 
produce  a  smaller  sequence.  This  rule  was  applied  to  determine  the  possible  decisions  at  112,  121 
and  122.  This  explains  everything  that's  missing  from  the  full  tree  for  2-. 

This  approach  can  get  rather  unwieldy  when  doing  larger  problems  by  hand.  Try  using  it  for 
the  six  long  Ferris  wheel.  D 


4.2     Listing  Structures  with  Symmetries  107 


1111     1112     1122     1212     1222  2222 


Figure  4.2   A  Ferris  wheel  decision  tree. 


Example  4.8    Another  problem    Four  identical  spheres  are  glued  together  so  that  three  of  them 

lie  at  the  vertices  of  an  equilateral  triangle  and  the  fourth  lies  at  the  center.  That  is,  the  centers  of 
the  spheres  lie  in  a  plane  and  three  of  the  centers  are  at  the  corners  of  an  equilateral  triangle  while 
the  fourth  is  in  the  center.  Thus,  the  sphere  arrangement  remains  unchanged  in  appearance  if  it  is 
flipped  over  about  any  of  three  axes  or  if  it  is  rotated  120  degrees  about  an  axis  that  passes  through 
the  center  of  the  center  sphere  and  is  perpendicular  to  the  plane  of  the  centers.  Draw  yourself  a 
picture  to  illustrate  this — it  is  very  useful  to  get  into  the  habit  of  drawing  pictures  to  help  visualize 
problems  like  this. 

We  have  four  tiny  identical  red  balls  and  four  tiny  identical  green  balls.  The  balls  are  to  be 
placed  in  the  spheres  so  that  each  sphere  contains  exactly  two  balls.  How  many  arrangements  are 

possible? 

The  calculation  can  be  done  with  the  help  of  a  decision  tree.  The  first  decision  could  be  the 
number  of  red  balls  to  be  placed  in  the  center  sphere.  If  no  red  balls  are  placed  in  the  center  sphere, 
then  two  green  balls  must  be  placed  there  and  two  in  the  outer  spheres.  Those  two  in  the  outer 
spheres  can  either  be  placed  in  the  same  sphere  or  in  different  spheres.  Proceeding  in  this  sort 
of  way,  we  can  construct  a  decision  tree.  You  should  do  this,  verifying  that  exactly  six  distinct 
arrangements  are  possible.  Q 

Example  4.9  A  subtler  method  Another  approach  to  the  Ferris  wheel  problem  is  to  take  into 
account  some  of  the  effects  of  the  symmetry  when  designing  the  decision  tree.  Let's  look  at  our 
six  long  Ferris  wheel. 

The  basic  idea  is  to  look  at  properties  that  depend  only  on  the  circular  sequence  rather  than 
on  how  we  have  chosen  to  write  it  as  a  list.  Unlike  the  simpler  approach  of  listing  things  in  lex 
order,  there  are  a  variety  of  choices  for  constructing  the  decision  tree.  As  a  result,  different  people 
may  construct  different  decision  trees.  Constructing  a  good  tree  may  take  a  fair  bit  of  thought.  Is  it 
worth  the  effort?  Yes,  because  a  good  decision  tree  may  be  considerably  smaller  than  one  obtained 
by  a  more  simplistic  approach. 

Before  reading  further  in  this  example,  construct  a  simple  lex  order  decision  tree  like  the 
one  in  Figure  4.2,  but  for  the  six  long  Ferris  wheel. 

Since  the  number  of  ones  in  a  sequence  remains  the  same,  we  can  partition  the  problem  according 
to  the  number  of  ones  that  appear  in  the  6  long  sequence.  Thus  our  list  of  possible  first  decisions 
could  be 

more  I's  than  2's,    three  of  each,    more  2's  than  I's. 

We  can  save  oiirsclvcs  some  work  right  away  by  noting  that  all  the  sequences  that  arise  from  the 
third  choice  can  be  obtained  from  those  of  the  first  choice  by  replacing  I's  with  2's  and  2's  with  I's. 

What  should  our  next  decisions  be?  We'll  do  something  different.  Define  a  function  s  on  sequences 
with  s{x)  equal  to  the  minimal  amount  the  sequence  x  must  be  circularly  shifted  to  obtain  x  again. 
(This  is  called  the  "period"  of  the  circular  sequence.)  Thus  s(llllll)  =  1,  s(121121)  =  3  and 
s(122221)  =  6.  Note  that  if  x  and  y  are  equivalent,  then  s{x)  =  s{y).  You  should  convince  yourself 
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Figure  4.3   Another  Ferris  wheel  decision  tree. 


that  X  consists  of  6/s(x)  copies  of  the  first  s{x)  elements  of  x.  As  a  result,  the  only  possible  values 
of  s  arc  1,  2,  3  and  6,  and  the  ratio  of  I's  to  2's  in  x  is  the  same  as  in  the  first  s{x)  entries. 

In  this  paragraph,  we  consider  the  case  in  which  the  number  of  I's  and  2's  in  x  are  equal.  By 
the  above,  s{x)  must  be  even.  If  s{x)  =  2,  we  need  three  repeats  of  a  two  long  pattern  that  contains 
a  1  and  a  2.  We  can  take  x  to  be  either  121212  or  212121.  We'll  adopt  our  usual  convention  of 
using  the  lexically  least  equivalent  sequence  for  the  canonical  sequence,  so  a;  =  121212.  If  s{x)  =  6, 
the  situation  is  more  complex  and  so  another  decision  will  be  used  to  break  this  case  down  further. 
Again,  many  choices  arc  possible.  We'll  use  m{x),  the  length  of  the  longest  string  of  consecutive 
ones  in  the  sequence  x.  Remember  to  read  the  list  circularly,  som(121211)  =  3.  Put  this  sequence  at 
the  start  of  the  list.  A  little  thought  should  convince  you  that  m{x)  =  1  implies  x  =  121212,  which 
does  not  have  s{x)  =  6.  Since  there  are  three  I's  and  three  2's,  m{x)  must  be  2  or  3.  If  m{x)  =  3, 
we  have  x  =  111222.  If  m{x)  =  2,  we  have  x  =  112??2.  The  questionable  entries  must  be  a  one  and 
a  two.  Either  order  works  giving  us  the  two  sequences  112122  and  112212. 

A  similar  analysis  to  that  in  the  previous  paragraph  can  be  used  for  the  case  in  which  there  are 
more  I's  than  2's.  We  leave  it  to  you  to  carry  out  the  analysis  for  this  case. 

The  decision  tree  we  have  developed  is  shown  in  Figure  4.3.  Compare  its  size  with  that  of  the 
simple  lex  order  tree  you  were  asked  to  construct.  Construct  another  decision  tree  using  a  different 
sequence  of  decisions  than  we  did  in  this  example.  Your  goal  should  be  to  come  up  with  something 
different  from  Figure  4.3  that  is  about  the  same  size  as  it  is.  Q 

It  is  helpful  to  understand  better  the  difference  between  the  two  methods  we've  used  for  the 
Ferris  wheel  problem.  Our  first  method  is  a  straightforward  pruning  of  the  decision  tree  for  listing  all 
functions  in  lex  order.  When  several  functions  correspond  to  the  same  structure,  we  retain  only  the 
lexicographically  least  one  of  them.  The  simplicity  of  the  method  makes  it  fairly  easy  to  program. 
Unfortunately  it  can  lead  to  a  rather  large  tree,  often  containing  many  decisions  that  lead  to  no 
canonical  solutions.  Thus,  although  it  is  straightforward,  using  it  for  hand  calculation  may  lead  to 
errors  because  of  the  amount  of  work  involved. 

Our  second  method  requires  some  ingenuity.  The  basic  idea  is  to  select  some  feature  of  the 
problem  that  lets  us  break  it  into  smaller  problems  of  the  same  basic  type  but  with  additional 
conditions.  Let's  look  at  what  we  did.  First  we  divided  the  problem  into  three  parts  depending  on 
the  number  of  ones  versus  the  number  of  twos.  Each  part  was  a  problem  of  the  same  type;  e.g.,  how 
many  different  arrangements  are  there  where  each  arrangement  has  more  ones  than  twos.  Isn't  this 
what  we  did  in  the  first  method  when  we  made  a  decision  like  "the  first  element  of  the  sequence  is 
1?"  No!  This  is  not  a  problem  of  the  same  type  because  the  condition  is  not  invariant  under  rotation 
of  the  sequence.  On  the  other  hand,  the  condition  that  there  be  more  ones  than  twos  is  invariant 
under  rotation. 

In  our  second  method,  we  next  chose  another  property  that  is  invariant  under  rotation  of  the 
sequence:  how  much  we  had  to  rotate  the  sequence  before  it  looked  the  same.  Next  we  looked  at  the 
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longest  consecutive  string  of  ones,  with  the  sequence  read  circularly  so  that  the  first  entry  follows 
the  last.  Again,  this  is  invariant  under  rotation.  Sometimes  we  did  not  need  to  go  that  far  because 
it  was  easy  to  see  the  solutions;  e.g.,  after  s{x)  =  1,  it  was  clear  that  111111  was  the  only  solution. 
On  the  other  hand,  after  the  decision  sequence  =,6,2  in  Figure  4.3,  it  was  still  not  obvious  what 
the  answer  was.  At  this  time  we  decided  it  was  easier  to  shift  back  to  our  first  method  rather  than 
find  another  property  that  was  invariant  under  rotation.  We  did  this  on  scratch  paper  and  simply 
wrote  the  result  as  two  solutions  in  the  figure. 

We  might  call  the  second  method  the  symmetry  invariant  method.  Why  is  symmetry  invariance 
better  than  the  first  method?  When  done  cleverly,  it  leads  to  smaller  decision  trees  and  hence  less 
chance  for  computational  errors.  On  the  other  hand,  you  may  make  mistakes  because  it  is  less 
mechanical  or  you  may  make  poor  selections  for  the  decision  criteria.  If  you  are  applying  symmetry 
invariance,  how  do  you  decide  what  properties  to  select  as  decision  criteria?  Also,  how  do  you  decide 
when  to  switch  back  to  the  first  method?  There  are  no  rules  for  this.  Experience  is  the  best  guide. 

Example  4.10  Listing  necklaces  We'll  work  another  example  using  symmetry  invariance.  How 
many  ways  can  the  corners  of  a  regular  hexagon  be  labeled  using  the  labels  B,  R  and  W,  standing  for 
the  colors  blue,  red  and  white?  Note  that  an  unlabeled  hexagon  can  be  rotated  60°  and/or  flipped 
over  and  still  look  the  same.  You  could  imagine  this  as  a  hexagon  made  from  wire  with  a  round 
bead  to  be  placed  at  each  corner.  We  impose  a  condition  on  the  finished  hexagon  of  beads: 

Adjacent  beads  must  be  different  colors. 

Our  first  decision  will  be  the  number  of  colors  that  actually  appear.  If  only  two  are  used,  there  are 
only  three  solutions:  BRBRBR,  BWBWBW  and  RWRWRW  since  adjacent  colors  must  be  different. 
(We've  used  the  same  sort  of  notation  we  used  for  the  Ferris  wheel.)  If  three  colors  are  used,  we 
decide  how  many  of  each  actually  appear.  The  possibilities  are  2,2,2  and  all  six  permutations  of 
1,2,3.  To  do  the  latter,  we  need  only  consider  the  case  of  1  blue,  2  red  and  3  white  and  then  permute 
the  colors  in  our  solutions  in  all  six  possible  ways.  (To  count,  we  simply  multiply  this  case  by  6.) 

Let's  do  the  1  blue,  2  red  and  3  white  case  by  the  first  method.  A  canonical  sequence  must 
start  with  B  and  be  followed  somehow  by  2  R's  and  3  W's  so  that  adjacent  letters  are  different. 
Call  the  sequence  associated  with  the  hexagon  Bx2XsX4X5X^.  There  are  no  solutions  with  a;2  =  R  or 
with  a;6  =  R  because  we  are  not  allowed  to  have  two  W's  adjacent.  Thus  X2  —  xq  =  W.  We  easily 
obtain  the  single  solution  BWRWRW.  Remember  that  this  gives  six  solutions  through  permutation 
of  colors. 

The  case  in  which  each  color  is  used  twice  remains  to  be  done.  We  make  a  decision  based  on 
whether  the  two  B's  are  opposite  each  other  or  not  on  the  hexagon.  We  use  the  first  method  now. 
The  case  of  opposite  B's  leads  to  the  sequence  3x2X:iBxr-,X(i  and  the  other  case  leads  to  Bj/2Bj/4j/5j/6- 
In  the  first  case,  X2  and  X3  are  different.  This  leads  to  two  lexically  least  sequences:  BRWBRW  and 
BRWBWR.  (The  sequence  BWRBWR  is  just  BRWBRW  fiipped  over.)  In  the  second  case,  choosing 
y2  determines  the  remaining  y's.  The  two  results  are  BRBWRW  and  BWBRWR. 

Adding  up  our  results,  there  are  3  +  6  x  1  +  (2  +  2)  =  13  solutions.  Q 

Exercises 


4.2.1.  Redo  Example  4.10  using  only  the  first  (mechanical)  method. 

4.2.2.  How  many  ways  can  the  eight  corners  of  a  regular  octagon  bo  labeled  using  the  labels  B  and  W. 
Note  that  an  unlabeled  octagon  can  be  rotated  45°  and/or  flipped  over  and  still  look  the  same. 
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4.2.3.  Let  F{r)  the  number  of  ways  to  place  beads  at  the  vortices  of  a  square  when  we  are  given  r  different 
types  of  round  beads.  Let  /(r)  be  the  same  rmmber  except  that  at  least  one  bead  of  each  of  the 
r  types  must  be  used.  Rotations  and  reflections  of  the  square  are  allowed  as  with  the  hexagon  in 
Example  4.10. 


(b)  By  evaluating  f{r)  for  r  <  4,  obtain  an  explicit  formula  for  F{r). 

4.2.4.  State  and  prove  a  generalization  of  the  formula  in  the  previous  exercise  that  expresses  F{r)  in 
terms  of  the  function  /.  Possible  generalizations  are  to  the  hexagon  and  the  n-gon.  Can  you  find  a 
generalization  that  has  little  or  no  connection  with  symmetries? 

4.2.5.  We  want  to  list  the  coverings  of  a  4  x  4  board  by  8  dominoes,  where  solutions  that  differ  only  by 
a  rotation  and /or  reflection  of  the  board  are  considered  to  be  the  same.  For  example,  Figure  3.15 
shows  11  ways  to  cover  a  3  x  4  board.  With  rotation  and/or  reflection,  only  5  are  distinct.  The  lex 
order  minimal  descriptions  of  the  distinct  ones  are  hhhhhh,  hhhvvh,  hhvhvh,  hhvvv  and  hvvvvh. 
List  the  lexically  least  coverings  of  the  4x4  board;  i.e.,  our  standard  choices  for  canonical  represen- 
tatives. 

4.2.6.  Draw  a  decision  tree  for  covering  the  4x4  board  using  one  each  of  the  shapes  shown  below.  Two 
boards  are  equivalent  if  one  can  be  transformed  to  the  other  by  rotations  and/or  reflections. 

Hint.  First  place  the  "T"  shaped  piece. 


4.2.7.   This  problem  is  concerned  with  listing  colorings  of  the  faces  of  a  cube.  Unless  you  are  very  good  at 

visualizing  in  three  dimensions,  we  recommend  that  you  have  a  cube  available  to  manipulate.  (Even 
a  sugar  cube  could  be  used.)  Also,  when  listing  solutions,  you  may  find  it  convenient  to  represent 
the  cube  in  the  plane  by  "unfolding"  it  as  shown  and  writing  the  colors  in  the  squares.  The  line  of 
four  faces  can  be  thought  of  as  the  four  sides  and  the  other  two  faces  can  be  thought  as  the  top  and 
bottom. 


(a)  List  and  count  the  ways  to  color  the  faces  using  at  most  the  2  colors  black  and  white. 

(b)  List  and  count  the  ways  to  color  the  faces  using  at  most  the  two  colors  black  and  white,  with  the 
added  condition  that  we  do  not  distinguish  between  a  cube  and  its  color  negative  (interchanging 
black  and  white). 

(c)  List  and  count  the  ways  to  color  the  faces  using  at  most  the  two  colors  black  and  white,  with 
the  added  condition  we  cannot  distinguish  between  a  cube  and  its  mirror  image. 

(d)  List  and  count  the  ways  to  color  the  faces  using  all  of  the  colors  black,  red  and  white;  i.e.,  every 
color  must  appear  on  each  cube. 

(e)  Count  the  ways  to  color  the  faces  using  the  colors  black,  red  and  white.  On  any  given  cube,  all 
colors  need  not  appear. 

(f)  Find  a  formula  F{r)  for  the  number  of  ways  to  color  the  faces  of  a  cube  using  r  colors  so  that 
whenever  two  faces  are  opposite  one  another  they  are  colored  differently. 

Hint.  See  Exercise  4.2.4. 


(a)  Prove  that  F(r)  =  (^)/(l)  +  Q/(2)  +  Q/(3)  +  Q/(4) 
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4.2.8.  We  say  that  /  is  a  "Boolean"  function  if  /:  {0,  1}"      {0, 1}. 

(a)  Prove  that  a  Boolean  function  with  n  =  2  can  be  thought  of  as  placing  zeroes  and  ones  at  the 
corners  of  a  1  x  1  square  with  lower  left  corner  at  the  origin.  Give  a  similar  interpretation  for 
n  —  3  using  a  cube. 

(b)  We  want  to  count  the  number  of  "different"  Boolean  functions  with  n  =  2  and  n  =  3.  Two 
functions  will  be  considered  equivalent  if  one  can  be  obtained  from  the  other  by  permuting  the 
arguments  and/or  complementation.  We  can  describe  this  precisely  in  an  algebraic  fashion  by 
saying  that  /,  g:  {0, 1}"  {0, 1}  are  equivalent  if  and  only  if  there  is  a  permutation  a  of  n  and 
c,di,. . .  ,dn  €  {0, 1}  such  that 


Interpret  the  equivalence  of  Boolean  functions  in  terms  of  symmetries  involving  the  square  and 
cube  when  n  =  2,  3. 

(c)  List  the  different  Boolean  functions  when  n  =  3. 
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We've  been  using  "equivalent"  rather  loosely  without  saying  what  it  means.  Since  ambiguous  terms 
provide  an  easy  way  to  make  errors,  we  should  define  it. 

Definition  4.2  Equivalence  An  equivalence  relation  on  a  set  5*  is  a  partition  of  5*.  We 
say  that  s,t  G  S  are  equivalent  if  and  only  if  they  belong  to  the  same  block  of  the  partition.  If 

the  symbol  ^  denotes  the  equivalence  relation,  then  we  write  s  ^  t  to  indicate  that  s  and  t  are 
equivalent.  An  equivalence  class  is  a  subset  of  S  consisting  of  all  objects  in  the  set  that  are 
equivalent  to  some  object;  i.e.,  an  equivalence  class  is  a  block  of  the  partition. 

Returning  to  our  circular  sequence  problem,  what  do  the  equivalence  classes  look  like?  First, 
mill  is  in  a  class  by  itself  because  all  rotations  give  us  the  same  sequence  again.  Likewise,  222222 
is  in  a  class  by  itself.  The  sequences  {121212,  212121}  is  a  third  equivalence  class.  The  sequences 
112112  and  122122  are  in  different  equivalence  classes,  each  of  which  contains  3  sequences.  So  far,  we 
have  5  equivalence  classes  containing  a  total  of  10  sequences.  What  about  the  remaining  2^  — 10  =  54 
sequences?  It  turns  out  that  they  fall  into  9  equivalence  classes  of  6  sequences  each.  Thus  there  are 
5  +  9  =  14  equivalence  classes;  that  is,  the  answer  to  our  circular  sequence  problem  is  14. 

This  method  is  awkward  for  larger  problems.  You  might  try  to  do  12  long  circular  sequences  of 
ones,  twos  and  threes,  where  the  answer  is  44,368.  Burnsidc's  Lemma  allows  us  to  do  such  problems 
more  easily.  In  order  to  state  and  prove  it  we  need  some  observations  about  the  symmetries. 

In  our  problem  the  symmetries  are  rotations  of  the  Ferris  wheel  through  0°,  60°,  120°,  180°, 
240°  and  300°.  These  correspond  to  reading  a  sequence  circularly  starting  with  the  first,  second, 
. . .  and  sixth  positions,  respectively.  Let  5*  be  the  set  of  all  six  long  sequences  of  zeroes  and  ones. 
The  six  symmetries  correspond  to  six  permutations  of  S  by  means  of  the  circular  reading.  For 
example,  if  gi  is  the  permutation  that  starts  reading  in  position  i  +  1,  then  g'i(111122)  =  111221 
and  513(111122)  =  122111.  Note  that  goix)  =  x  for  all  x.  Alternatively,  we  can  think  of  as  shifting 
the  sequence  "circularly"  to  the  left  by  i  positions.  The  set  G  =  {go, ...  ,55}  has  some  important 
properties,  namely 


where  uQv  \s  u  +  v  unless  u  =  v  =  I'm.  which  case  w  ®  u  =  0.  ( "Exclusive  or"  and  "mod  2  sum' 
are  other  names  for  u  ®  v.)  For  n  =  2,  there  are  four  different  Boolean  functions: 
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(G-1)  There  is  an  e  £  G  such  that  e{x)  =  x  for  all  x  €  S. 
(G-2)  If  /  e  G,  then  the  inverse  of  /  exists  and  e  G. 
(G-3)  If  /,  g  G  G,  then  the  eomposition  fg  is  in  G. 

The  function  e  is  called  the  "identity"  and  e  is  reserved  for  its  name.  You  should  be  able  to  verify 
that  =  g5-i  and  gigj  =  gu,  where  k  =  i-\-  j  ii  this  is  less  than  6  and  k  =  j  —  &  otherwise.  Any 
set  of  permutations  with  properties  (G-1),  (G-2)  and  (G-3)  is  called  a  permutation  group.  Group 
theory  is  an  important  subject  that  is  part  of  the  branch  of  mathematics  called  algebra.  We  barely 
touch  on  it  here. 

Symmetries  always  lead  to  permutation  groups.  Why?  First,  recall  that  a  symmetry  of  some 
thing  is  a  rearrangement  that  leaves  the  thing  looking  the  same  (in  our  case,  the  thing  is  the  empty 
Ferris  wheel).  Taking  the  inverse  corresponds  to  reversing  the  motion  of  the  symmetry,  so  it  again 
leaves  the  thing  looking  the  same.  Taking  a  product  corresponds  to  one  symmetry  followed  by 
another  and  so  leaves  the  thing  looking  the  same. 

What  is  the  connection  between  the  equivalence  classes  and  the  permutation  group  for  the 
sequences?  It  is  simple:  Two  sequences  x,  y  E  S  arc  equivalent  if  and  only  if  y  =  g{x)  for  some 
g  €  G.  In  general,  a  group  G  of  permutations  on  a  set  S  defines  an  equivalence  relation  in  this  way. 
That  requires  a  bit  of  proof,  which  we  give  at  the  end  of  the  section.  We  can  now  state  Burnside's 
Lemma,  but  we  defer  its  proof  until  the  end  of  the  section.  In  this  theorem  the  expression  X^^gg  ^id) 
appears.  For  those  unfamiliar  with  such  notation,  it  means  that  we  must  add  up  the  values  of  N{g) 
for  all  g  G  G.  The  order  in  which  we  add  them  does  not  matter  since  order  is  irrelevant  in  addition. 

Theorem  4.5    Burnside's  Lemma        Let  S  he  a  set  with  a  permutation  group  G.  The 

number  of  equivalence  classes  that  G  defines  on  S  is 

wiiere  N{g)  is  the  number  of  x  G  S  such  that  g{x)  =  x. 

Example  4.11  The  Ferris  wheel  generalized  We'll  redo  the  Ferris  wheel  problem  and  gen- 
eralize it. 

Burnside's  Lemma  tells  us  that  the  answer  to  the  Ferris  wheel  problem  is 
i  {N{go)  +  N{g,)  +  N{g2)  +  Nig^)  +  N{g^)  +  Nig^)) . 

Let's  compute  the  terms  in  the  sum.  N{go)  =  2^  since  go  =  e,  the  identity,  and  there  are  two 

choices  for  each  of  the  six  positions  in  the  sequence.  What  is  N{gi)?  If  x  =  xiX2X3a;4.T5.Tf,.  then 
gi{x)  =  X2X^X4,X5Xf,Xi.  Since  we  want  gi{x)  =  x,  we  need  xi  =  X2,  X2  =  X3,  ...  and  xe  =  xi. 
In  other  words,  all  the  Xi's  are  equal.  Thus  N{g-i)  =  2.  Since  52(2;)  =  xzXiX5XQX\X2,  we  find  that 
xi  =  3:3  =  2:5  and  X2  =  x/^  =  xa.  Thus  N{g2)  =  2^  =  4.  You  should  be  able  to  prove  that  N{gs)  =  8, 
N{g4)  =  4  and  N{g5)  =  2.  Thus  the  number  of  equivalence  classes  is  |(64  -|-  2  -|-  4  -|-  8  -|-  4  -|-  2)  =  14. 

Now  suppose  that  instead  of  placing  just  ones  and  twos  in  circular  sequences,  we  have  k  symbols 
to  choose  from.  Our  work  in  the  previous  paragraph  makes  it  easy  for  us  to  write  down  a  formula. 
Note  that  for  g2{x)  =  x  we  found  that  Xi  =  X3  =  x^  and  X2  =  X4  =  xq.  Thus  we  can  choose  one 
symbol  as  the  value  for  xi  =0:3  =  X5  and  another  (possibly  the  same)  symbol  as  the  value  for 
X2  =  X4  =  xe-  Thus  N{g2)  =  fc^.  The  other  N{gi)  values  can  be  determined  in  a  similar  manner 
giving  us 

^^k'  +  k  +  e  +  k'+k'  +  k)  =  k{k'  +  k^  +  2k  +  2) 

different  arrangements.  When  fc  =  2,  we  obtain  the  result  from  the  previous  paragraph. 

Now  let's  modify  what  we  just  did  by  adding  the  requirement  that  adjacent  symbols  must  be 
difii'erent.  In  this  case,  N{gi)  =  0  because  gi{x)  =  x  requires  that  xi  =  X2,  which  is  forbidden.  For 
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173(3;)  =  X,  we  have  xi  =  X4,  X2  =  xc,  and  x^  =  xq.  Since  the  symbols  assigned  to  xi  =  X4,  X2  and  Xs 
must  all  be  different,  ^^(33)  =  k{k  —  l)(fc  —  2).  With  a  bit  more  work,  we  find  that  there  are 

M^(ffo)  +  0  +  fc(fc  -  1)  +  k{k  -  l){k  -  2)  +  k{k  -  1)  +  0) 

equivalence  classes.  The  determination  of  N{go)  is  a  bit  more  difficult.  We'll  discuss  this  type  of 
problem  in  Section  6.2.  The  answer  is  N{go)  =  {k  —  1)^  +  k  —  1.  Thus  the  number  of  equivalence 
classes  is 

i(fc-l)((fc-lf +  + 

We  can  check  these  calculations  a  bit  by  noting  that  when  A;  =  1  there  should  be  no  solutions  and 
when  k  =  2  there  should  be  1.  Substitution  into  our  formula  does  indeed  give  0  and  1.  Q 

We  can  shorten  some  of  the  work  in  the  previous  example  by  thinking  of  permutations  a  bit 
differently.  Instead  of  looking  at  permutations  of  6  long  sequences,  we  can  look  at  the  way  the  per- 
mutation rearranges  the  positions  of  the  sequence.  For  example,  g2ixiX2X3X4Xr^X(i)  =  x^XiX5XQXiX2 
can  be  interpreted  as  saying  position  1  is  replaced  by  position  3,  position  2  is  replaced  by  posi- 
tion 4,  and  so  forth.  This  is  a  new  permutation:  Instead  of  permuting  the  set  of  6  long  sequences 
it  permutes  the  set  6.  To  emphasize  both  the  difference  and  the  relationship,  we  use  7,  the  Greek 
letter  corresponding  to  g.  In  cycle  form,  72  =  (1,  3,  5)(2, 4,  6).  Thinking  of  a  sequence  as  a  function 
/:6  — >  fc,  the  function  will  be  counted  by  N{j2)  =  -^(52)  if  and  only  if  it  is  constant  on  the  cycles 
of  72.  This  is  a  general  result: 

Principle  In  the  set  of  allowed  functions,  N{j)  counts  precisely  those  allowed  functions  which 
are  constant  on  the  cycles  of  7;  i.e.,  those  functions  f  such  that  f{x)  =  f{y)  whenever  x  and  y 
are  in  the  same  cycle  of  7. 

For  example,  if  all  functions  f-.A^B  are  allowed,  A''(7)  is  simply  \B\^  where  c  is  the  number  of 

cycles  of  7.  In  the  concluding  problem  of  the  last  example,  not  all  functions  were  allowed  because 
the  adjacency  constraint  requires  that  f{i)  f{i  +  1)  for  all  i.  In  that  case,  computing  N{^)  is  a 
bit  harder,  but  the  principle  can  still  be  used. 

Example  4.12  Counting  necklaces  How  many  ways  can  4  identical  round  green  beads  and 
4  identical  round  red  beads  be  arranged  to  form  a  necklace  of  8  beads?  Due  to  the  nature  of  the 

beads,  two  necklaces  will  be  the  same  if  one  can  be  obtained  from  the  other  by  rotation  or  flipping 
over.  The  symmetries  are  like  those  in  Example  4.10  (p.  109)  except  that  we  now  have  8  positions 
instead  of  6. 

Wc  can  imagine  a  necklace  as  an  8  long  sequence  and  use  the  idea  wc  just  discussed  to  describe 
the  permutations.  Obviously  all  that  matters  for  counting  purposes  is  the  size  of  the  cycles — not 
what  they  contain.  Altogether  there  are  16  permutations  of  8,  which  are  shown  here  in  5  classes, 
where  Pk  is  the  set  of  permutations  having  k  cycles.  We've  omitted  commas  separating  entries  in 
the  cycles.  We  leave  it  to  you  to  check  that  the  list  is  correct  and  complete. 

Ps  (1)(2)(3)(4)(5)(6)(7)(8) 

Ps    (1)(2,8)(3,7)(4,6)(5)        (1,3)(2)(4,8)(5,7)(6)  (1,5)(2,4)(3)(6,8)(7) 
(1,7)(2,6)(3,5)(4)(8) 

P4    (1,5)(2,6)(3,7)(4,8)  (1,2)(3,8)(4,7)(5,6)  (1,4)(2,3)(5,8)(6,7) 

(1,6)(2,5)(3,4)(7,8)  (1,8)(2,7)(3,6)(4,5) 

P2    (1,3,5,7)(2,4,6,8)  (1,7,5,3)(2,8,6,4) 

Pi    (1,2,3,4,5,6,7,8)  (1,4,7,2,5,8,3,6)  (1,8,7,6,5,4,3,2) 

(1,6,3,8,5,2,7,4) 
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Suppose  that  7  G  Pg;  i.e.,  7  =  (1)  •  •  •  (8).  Since  each  cycle  has  length  1,  we  can  simply  choose 
4  cycles  to  be  the  green  beads.  This  can  be  done  in  (^)  =  70  ways.  Thus  A''(7)  =  70.  Suppose  7  G  P5. 
We  can  either  choose  2  of  the  2  cycles  to  be  green  beads  OR  choose  both  1  cycles  and  one  of  the 
2  cycles.  Thus  N{'y)  =  Q  +  (l)  =  6.  Similarly,  for  P4,  P2  and  Pi  we  have  the  values  (2)  =  6,  2 
and  0,  respectively,  for  N{j).  Thus  the  number  of  necklaces  is 

^(70  +  6x4  +  6x5  +  2x2)  =  8. 

Since  there  were  only  a  few  solutions,  it  probably  would  have  been  easier  to  count  them  by  first 
listing  them.  Do  it.  Unfortunately,  it  is  usually  not  easy  to  tell  in  advance  that  there  will  be  so  few 
solutions  that  listing  tlicni  is  easier  than  using  Burnside's  Lemma.  One  approac;h  is  to  start  listing 
them.  If  there  seem  to  be  too  many,  your  time  will  probably  not  have  been  wasted  because  you  will 
have  gotten  a  better  feel  for  the  problem.  Q 


*  Proofs 


We'll  conclude  this  section  with  the  two  proofs  that  we  put  off: 

1.  A  permutation  group  G  on  a  set  S  gives  an  equivalence  relation  on  S. 

2.  Burnside's  Lemma  is  true. 

Our  proofs  will  be  fairly  heavy  in  notation  and  manipulation.  If  this  causes  difficulties  for  you,  it 
may  help  if  you  consider  what  is  happening  in  a  simple  case.  For  example,  you  might  look  the 
permutations  associated  with  the  Ferris  wheel  problem.  You  may  also  need  to  reread  the  proofs. 

Proof:  (Permutation  groups  give  equivalence  relations.)  To  prove  this,  we  must  prove  that  defin- 
ing x,y  G  S  to  he  equivalent  if  and  only  if  2/  =  g{x)  for  some  g  G  G  does  indeed  give  an  equivalence 
relation.  In  other  words,  we  must  prove  that  there  is  a  partition  of  S  such  that  x  and  y  are  in  the 
same  block  of  S  if  and  only  if  y  =  g{x)  for  some  g  £  G. 

Let 

=  {y  €  S\y  =  g{x)  foT  some  g  e  G}.  4.19 
We  need  to  know  that  the  set  of  B^'s  form  a  partition  of  S. 

(a)  We  have  x  G  B^  because  x  =  e{x).  Thus  every  a;  in  5  is  in  at  least  one  block. 

(b)  We  must  prove  that  the  blocks  are  disjoint;  that  is,  if  B^HBy  ^  0,  then  B^  =  By.  Suppose  that 
z  £  BxdBy  and  w  £  B^,  then,  by  (4.19),  there  are  permutations  /,  g  and  h  such  that  z  =  f{x), 
z  =  g{y)  and  w  =  h{x).  Thus 

w  =  h{x)  =  h{f-\z))  =  h{r\g{v)))  =  {hf-'g){y). 

By  (G-2)  and  (G-3),  hf-^g  €  G.  Thus  w  G  By.  We  proved  that  B^  C  By.  Similarly,  By  C  P^ 
and  so  B^  =  By.  Q 
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Proof:  (Burnside's  Lemma)  Before  proving  Bmnside's  Lemma,  we'll  prove  something  that  will 
be  needed  later  in  the  proof.  Let  x  be  some  element  of  S  and  let  Ix  be  the  set  of  all  5  G  G  such  that 
^(a;)  =  X.  We  will  prove  that 

\Ix\-\Bx\  =  \G\,  4.20    where      is  defined  by  (4.19). 

To  illustrate  this,  consider  our  Ferris  wheel  problem  with  x  =  121212,  we  have  =  {121212, 212121} 
and  Ix  =  {go,  92, 54}  and  \G\  =6.  You  should  look  at  some  other  examples  to  convince  yourself  that 
(4.20)  is  true  in  general. 

How  can  we  prove  (4.20)?  We  use  a  trick.  Let  F:G^She  defined  by  F{g)  =  g{x).  Be 

careful:  x  is  fixed  and  g  is  the  variable.  Note  that  Image(F)  =  Bx  and  F~^{x)  =  Ix-  We  claim  that 
\P~^{y)\  —  all  y  e  Bx-  The  vahdity  of  this  claim  is  enough  to  prove  (4.20)  because 

the  claim  proves  that  the  coimage  of  F,  which  is  a  partition  of  G,  consists  of  \Bx\  blocks  each  of 

size  \F-^{x)\. 

We  now  prove  the  claim.  Now  both  x  and  y  are  fixed.  Since  y  €  Bx,  there  is  some  h  ^  G  such 
that  y  =  h{x).  Then 

F-i(y)  =  {geG\g{x)=y}  by  the  definition  of  F"! 

=  {5  e  G  I  g{x)  =  h{x)}  since  y  =  h{x)\ 

=  {geG\{h-'g){x)=x}  by  (G-2) 

=  {hk  I  A;  e  G  and  k{x)  =  x}  by  setting  k  ~  h~^g 

=  {hk\keF~^{x)}  by  the  definition  of  F 

=  hF-^x). 

This  gives  us  a  bijcction  between  F^^(y)  and  F^^{x).  Thus  \F^^{y)\  =  \F^^{x)\. 

We  now  prove  Burnside's  Lemma.  The  number  of  equivalence  classes  is  simply  the  number  of 
distinct  Bx's.  Unfortunately  we  can't  easily  get  our  hands  on  an  entire  equivalence  class  or  a  canonical 
representative.  The  following  observation  will  let  us  look  at  all  the  elements  in  each  equivalence  class; 
i.e.,  all  the  elements  in  S. 


For  any  set  T,  1 


y-- 


You  should  be  able  to  prove  this  easily. 

Let  £  be  the  set  of  equivalence  classes  of  S.  Then 

1^1  =  El  =  E  Em  =  E  Ejk-y 

bg£         Be£        I    I       Be£  v^b  i 

since  By  =  B.  The  last  double  sum  is  just  J2yeS^/\^v\  because  each  y  ^  S  belongs  to  exactly 
one  equivalence  class.  Let  x(-P)  be  1  if  the  statement  P  is  true  and  0  if  it  is  false.  This  is  called  a 
characteristic  function.    Using  the  above  and  (4.20), 

"  E^  =  eW  =  pEi^s/i 

yes  '  ^'       yes  II        I   '  yes 

=   1^  E  (E  X{g{y)  =  y))  by  the  definition  of  7^; 

'    '  yes  geG 


1 

W\ 


E(E 

geG  yes 


x{g{y)  =  y))  by  interchanging  summation; 


geG 

This  completes  the  proof  of  Burnside's  Lemma.  Q 


V  iV(5)  by  the  definition  of  N{g). 

\G\  ^ 
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Exercises 


4.3.1.  Suppose  that  you  can  count  only  ordered  hsts  and  you  would  like  a  formula  for  C(n,  k),  the  number 
of  k  element  subsets  of  n.  Let  A  be  the  set  of  all  A;-lists  without  repeated  elements  that  can  be  formed 

from  n.  Let  B  be  all  subsets  of  n.  We  define  F:  A  ^  B  as  follows.  For  a  £  A,  let  F{a)  be  the  set  whose 
elements  are  the  items  in  the  list  a.  By  studying  the  image  of  F  and  |F~^(a;)|  for  x  G  Image(F), 
obtain  a  formula  for  C(n,  k). 

4.3.2.  How  many  8-long  circular  sequences  can  be  made  using  the  ten  digits  0, 1,  2, . . . ,  9  if  no  digit  can 
appear  more  than  once?  Can  you  generalize  your  answer  to  n-long  circular  sequences  when  k  things 
are  available  instead  of  just  ten? 

4.3.3.  Redo  Example  4.12  with  the  numbers  of  beads  changed  from  4  and  4  to  3  and  5.  Use  Burnside's 
Lemma. 

4.3.4.  Redo  Example  4.12  where  there  are  k  colors  of  beads  and  there  are  no  constraints  on  how  often  a 
color  may  be  used. 

4.3.5.  Label  the  vertices  of  a  regular  n-gon  clockwise  using  the  numbers  1  to  n  in  order.  We  can  describe 
a  symmetry  of  the  n-gon  by  a  permutation  of  n.  The  set  of  n  symmetries  of  the  n-gon  that  involve 

just  rotating  it  in  the  plane  about  its  center  is  called  the  cyclic  group  on  n.  The  set  that  also  allows 
flipping  the  n-gon  over  is  called  the  dihedral  group  on  n.  It  contains  2n  symmetries,  including  the 
original  n  from  the  cyclic  group. 

(a)  Describe  the  elements  of  the  cyelie  group  as  permutations  in  two  line  form.  (There  is  a  very 
simple  description  of  the  second  line.  You  should  be  able  to  find  it  by  drawing  a  picture  and 
rotating  it.) 

(b)  Describe  the  elements  of  the  dihedral  group  as  permutations  in  two  line  form.  (There  is  a  simple 
description  of  the  second  line  of  the  additional  permutations  not  in  the  eyclic  group.) 

(c)  Describe  the  cycles  of  the  elements  of  the  dihedral  group  that  are  not  in  the  cyclic  group. 

4.3.6.  How  many  ways  can  8  squares  be  colored  green  on  a  4  x  4  board  of  16  squares? 

(a)  Assume  that  the  only  symmetries  that  are  allowed  are  rotations  of  the  board. 

(b)  Assume  that  the  board  can  be  rotated  and  flipped  over. 

4.3.7.  Starting  with  the  observation 

yes  geG  geG  yeS 

use  (4.20)  to  prove  Burnside's  Lemma.  (This  is  just  a  rearrangement  of  the  proof  in  the  text.) 

4.3.8.  Let  D{n)  be  the  number  of  ways  to  arrange  n  dominoes  to  cover  a  2  x  n  board  with  no  symmetries 
allowed.  Let  d(n)  be  the  number  of  ways  to  arrange  them  when  rotations  and  refiections  are  allowed. 

(a)  List  the  coverings  that  give  -D(5)  =  8  and  D{6)  =  13.  Describe  the  coverings  in  general. 

(b)  Prove  that  D{n)  is  the  number  of  compositions  of  n  where  the  only  allowed  parts  are  1  and  2. 

(c)  Prove  that  d{n)  is  the  number  of  equivalence  classes  of  compositions  of  n  into  ones  and  twos, 
where  two  compositions  are  equivalent  if  they  are  the  same  or  if  reading  one  from  left  to  right 
is  the  same  as  the  other  read  from  right  to  left. 

(d)  Prove  that 

{UDin)+D{k))  ifn  =  2jfc-|-l; 

^  (j){n)  +  D{k)  +  D{k  -  1)^    if  n  =  2k. 
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Notes  and  References 


Alternative  discussions  of  the  Principle  of  Inclusion  and  Exclusion  can  be  found  in  the  texts  by 
Bogart  [3:  Ch.3],  Stanley  [6;  Ch.2]  and  Tucker  [7:  Ch.8].  Stanley  [6:  Sec.2.G]  uses  the  "Involution 
Principle"  to  give  a  "bijective"  proof  of  the  Principle  of  Inclusion  and  Exclusion.  A  bijective  proof 
of  a  formula  first  interprets  both  sides  of  a  formula  as  counting  something  in  a  simple  manner  (in 
particular,  no  minus  signs  are  normally  present).  The  proof  consists  of  a  bijcction  between  the  two 
sets  of  objects  being  counted.  The  Involution  Principle  is  a  fairly  new  technique  for  proving  bijections 
that  was  introduced  by  Garsia  and  Milne  [4].  Exercise  4.1.15  was  adapted  from  [8]. 

Gian-Carlo  Rota  [5]  introduced  Mobius  inversion  to  combinatorialists.  A  less  advanced  discus- 
sion of  Mobius  inversion  and  its  applications  has  been  given  by  Bender  and  Goldman  [1] .  Mobius 
inversion  is  only  one  of  the  many  aspects  of  partially  ordered  sets  which  have  become  important  in 
combinatorial  theory.  See  Stanley  [6;  Ch.3]  for  an  introduction.  In  turn,  partially  ordered  sets  are 
only  one  of  the  tools  of  modern  algebraic  mathematics  that  are  important  in  combinatorics.  This 
explosive  growth  of  algebraic  methods  in  combinatorics  began  in  the  late  1960's. 

We'll  return  to  the  study  of  objects  with  symmetries  in  Section  11.3,  where  we  connect  a  special 
case  of  Burnside's  Lemma  with  generating  functions.  Among  the  texts  that  discuss  enumeration 
with  symmetries  are  those  by  Biggs  [2;  Chs.13,14]  and  Tucker  [7;  Ch.9].  Wilhamson  [9;  Ch.4]  goes 
deeper  into  some  aspects  related  to  computer  applications.  The  study  of  objects  with  symmetries 
is  inevitably  tied  to  the  theory  of  permutation  groups,  which  we  have  attempted  to  minimize.  See 
Biggs  [2]  for  more  background  on  group  theory. 
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PART  II 

Graphs 


Graph  theory  is  one  of  the  most  widely  apphcable  areas  of  mathematics.  Its  concepts  and  terminology 
are  used  in  many  areas  to  help  formulate  and  clarify  ideas.  Graph  theory  theorems  find  application 
in  a  wide  range  of  fields,  particularly  the  newer  scientific  disciplines. 

The  notion  of  a  "graph"  is  deceptively  simple:  It  is  a  collection  of  points  (called  "vertices" )  that 
are  joined  by  lines  (called  "edges").  Often  all  that  matters  about  the  edges  is  which  two  vertices 
they  join,  not  their  length  or  how  they  curve.  This  concept  is  deceptive  because  it  seems  unlikely 
that  such  a  simple,  general  notion  could  have  an  interesting  theory  or  be  of  any  use.  Simplicity  is 
important.  Scientists  often  try  to  find  the  simplest  workable  solution  both  in  formulating  theories  and 
designing  experiments.  Mathematicians  try  to  find  the  simplest  concepts  that  usefully  encompass 
what  they  are  studying.  A  computer  scientist,  being  part  scientist  and  part  mathematician,  also 
looks  for  simplicity.  Why  is  there  this  push  for  simplicity? 

1.  Many  scientists  believe  that  the  underlying  laws  of  the  universe  should  exhibit  elegance  and 
simplicity. 

2.  The  more  complicated  a  construction  is,  the  more  likely  it  is  to  malfunction.  Some  familiar 
examples  of  this  are  experimental  apparatus,  algorithms  and  computer  programs. 

3.  A  simple  concept  is  usually  more  flexible  than  a  complex  one  and  so  can  be  applied  in  more  new 
situations.  Common  examples  in  which  simplicity  is  a  virtue  are  definitions,  scientific  theories 
and  data  structures. 

"Simple"  should  not  be  confused  with  "unsophisticated."  Special  relativity,  complex  variables  and 
context-free  grammars  are  all  simple;  but  none  of  them  are  unsophisticated. 

We'll  introduce  some  of  the  basic  concepts  in  graph  theory  in  Chapter  5  and  then  discuss  some 
theory  and  applications  in  Chapter  6.  To  thoroughly  discuss  applications  of  graphs  in  computer 
science  would  require  a  very  large  book.  Another  would  be  needed  to  discuss  the  purely  mathematical 
aspects  of  graph  theory.  We've  picked  a  variety  of  important  topics  from  different  areas  of  graph 
theory  and  computer  science.  The  topics  we've  chosen  reflect  our  perceptions  of  what  you  should 
learn  and  also  our  own  interests.  These  topics  are 

•  Spanning  trees:  an  important  tool  in  combinatorial  algorithms; 

•  Graph  coloring:  a  subject  with  pretty  results  and  a  relatively  long  history; 

•  Planarity:  a  deep  subject  with  connections  to  graph  coloring; 

•  Flows  in  networks:  an  important  application  of  graphs; 

•  Random  graphs:  the  properties  of  typical  graphs; 

•  Finite  state  machines:  an  important  concept  for  formal  languages  and  compiler  design. 
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CHAPTER  5 

Basic  Concepts 

in 

Graph  Theory 


Introduction 


The  concepts  in  this  chapter  are  essential  for  understanding  later  discussions  involving  graphs,  so  be 
sure  that  you  understand  them.  It  is  not  necessary  to  memorize  all  the  concepts  since  you  can  refer 
back  to  them  if  necessary;  however,  make  sure  that  you  understand  them  when  you  study  them  now 
so  that  referring  back  to  them  will  simply  be  a  memory  refresher,  not  a  new  learning  experience. 

Since  the  basic  concepts  in  Section  2.1  (p.  41)  are  used  in  this  chapter,  you  may  wish  to  review 
them  before  continuing. 

5.1  What  is  a  Graph? 


There  are  various  types  of  graphs,  each  with  its  own  definition.  Unfortunately,  some  people  apply 
the  term  "graph"  rather  loosely,  so  you  can't  be  sure  what  type  of  graph  they're  talking  about  unless 
you  ask  them.  After  you  have  finished  this  chapter,  we  expect  you  to  use  the  terminology  carefully, 
not  loosely.  To  motivate  the  various  definitions,  we'll  begin  with  some  examples. 

Example  5.1  A  computer  network  Computers  are  often  linked  with  one  another  so  that  they 
can  interchange  information.  Given  a  collection  of  computers,  we  would  like  to  describe  this  linkage 
in  fairly  clean  terms  so  that  we  can  answer  questions  such  as  "How  can  we  send  a  message  from 
computer  A  to  computer  B  using  the  fewest  possible  intermediate  computers?" 

We  could  do  this  by  making  a  list  that  consists  of  pairs  of  computers  that  are  connected.  Note 
that  these  pairs  are  unordered  since,  if  computer  C  can  communicate  with  computer  D,  then  the 
reverse  is  also  true.  (There  are  sometimes  exceptions  to  this,  but  they  are  rare  and  we  will  assume 
that  our  collection  of  computers  does  not  have  such  an  exception.)  Also,  note  that  we  have  implicitly 
assumed  that  the  computers  are  distinguished  from  each  other:  It  is  insufficient  to  say  that  "A  PC  is 
connected  to  a  Mac."  We  must  specify  which  PC  and  which  Mac.  Thus,  each  computer  has  a  unique 
identifying  label  of  some  sort. 

For  people  who  like  pictures  rather  than  lists,  we  can  put  dots  on  a  piece  of  paper,  one  for  each 
computer.  We  label  each  dot  with  a  computer's  identifying  label  and  draw  a  curve  connecting  two 
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Figure  5.1  Computers  connected  by  networks.  Computers  (vertices)  are  indicated  by  dots  (•)  with  labels. 
The  connections  (edges)  are  indicated  by  lines.  When  lines  cross,  they  should  be  thought  of  as  cables  that 
lie  on  top  of  each  other — not  as  cables  that  are  joined. 


dots  if  and  only  if  the  corresponding  computers  are  connected.  Note  that  the  shape  of  the  curve 

docs  not  matter  (it  could  be  a  straight  line  or  something  more  complicated)  because  we  arc  only 
interested  in  whether  two  computers  are  connected  or  not.  Figure  5.1  shows  such  a  picture.  Each 
computer  has  been  labeled  by  the  initials  of  its  owner. 

Recall  that  V2{V)  stands  for  the  set  of  all  two  element  subsets  of  the  set  V.  Based  on  our 
computer  example  we  have 

Definition  5.1  Simple  graph  A  simple  graph  G  is  a  set  V",  caiJed  the  vertices  of  G,  and 
a  subset  E  ofV2{V)  (i.e.,  a  set  E  of  2  element  subsets  ofV),  called  the  edges  of  G.  We  can 
represent  this  by  writing  G  =  {V,  E). 

In  our  case,  the  vertices  are  the  computers  and  a  pair  of  computers  is  in  E  if  and  only  if  they  are 
connected.  Q 

Example  5.2  Routes  between  cities  Imagine  four  cities  named,  with  characteristic  mathe- 
matical charm.  A,  B,  C  and  D.  Between  these  cities  there  are  various  routes  of  travel,  denoted  by 
a,b,c,d,e,  f  and  g.  A  picture  of  this  situation  is  shown  in  Figure  5.2.  Looking  at  it,  we  see  that 
there  are  three  routes  between  cities  B  and  C.  These  routes  arc  named  d,  e  and  /.  Figure  5.2  is  in- 
tended to  give  us  a  picture  of  only  the  interconnections  between  cities.  It  leaves  out  many  aspects  of 
the  situation  that  might  be  of  interest  to  a  traveler.  For  example,  the  nature  of  these  routes  (rough 
road,  freeway,  rail,  etc.)  is  not  portrayed.  Furthermore,  unlike  a  typical  map,  no  claim  is  made  that 
the  picture  represents  in  any  way  the  distances  between  the  cities  or  their  geographical  placement 
relative  to  each  other.  The  object  shown  in  Figure  5.2  is  called  a  graph.  Edges  a  and  b  are  called 
parallel  as  arc  edges  d,  e  and  /. 

Following  our  previous  example,  one  is  tempted  to  list  the  pairs  of  cities  that  are  connected;  in 
other  words,  to  extract  a  simple  graph  from  the  information.  Unfortunately,  this  does  not  describe 
the  problem  adequately  because  there  can  be  more  than  one  route  connecting  a  pair  of  cities;  e.g., 
d,  e  and  /  connecting  cities  B  and  G  in  the  figure.  How  can  we  deal  with  this?  Definition  5.2  is  a 
precise  definition  of  a  graph  of  the  type  required  to  handle  this  type  of  problem. 
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Figure  5.2     Capital  letters  indicate  cities  and  lower  case  indicate  routes. 


Definition  5.2  Graph  A  graph  is  a  triple  G  =  {V,E,(p)  where  V  and  E  are  Unite  sets 
and  if  is  a  function  with  range  V2{V)  and  with  domain  E.  We  call  E  the  set  of  edges  of  the 
graph  G.  The  set  V  is  called  the  set  of  vertices  of  G. 

In  Figure  5.2,  G  =  {V,  E,  ip)  where 


Definition  5.2.  tells  us  that  to  specify  a  graph  G  it  is  necessary  to  specify  the  sets  V  and  E  and  the 
function  ip.  We  have  just  specified  V  and  in  set  theoretic  terms.  Figure  5.2  specifies  the  same  V 
and  if  in  pictorial  terms.  The  set  V  is  represented  clearly  in  Figure  5.2  by  dots  (•),  each  of  which 
has  a  city  name  adajccnt  to  it.  Similarly,  the  set  E  is  also  represented  clearly.  The  function  ip  is 
determined  from  Figure  5.2  by  comparing  the  name  attached  to  a  route  with  the  two  cities  connected 
by  that  route.  Thus,  the  route  name  d  is  attached  to  the  route  with  endpoints  B  and  C.  This  means 
that  ip{d)  =  {B,C}. 

Note  that,  since  part  of  the  definition  of  a  function  includes  its  range  and  domain,  (p  determines 
P2{V)  and  E.  Also,  V  can  be  determined  from  V-ziV).  Consequently,  we  could  have  said  that  a 
graph  is  a  function  ip  whose  domain  is  a  finite  set  and  whose  range  is  V2{V)  for  some  finite  set  V. 
Instead,  we  choose  to  specify  V  and  E  explicitly  because  the  vertices  and  edges  play  a  fundamental 
role  in  thinking  about  a  graph  G.  Q 

The  function  Lp  is  sometimes  called  the  incidence  Junction  of  the  graph.  The  two  elements  of 
ip{x)  —  {u,  w},  for  any  x  G  -E,  are  called  the  vertices  of  the  edge  a;,  and  we  say  u  and  v  are  joined  by 
X.  We  also  say  that  u  and  v  are  adjacent  vertices  and  that  u  is  adjacent  to  u  or  ,  equivalently,  v  is 
adjacent  to  u.  For  any  v  ,\i  v  is  &  vertex  of  an  edge  x  then  we  say  x  is  incident  on  v.  Likewise, 
we  say  is  a  member  of  x,  v  is  on  x,  or  v  is  in  x.  Of  course,  w  is  a  member  of  x  actually  means  v  is 
a  member  of  p{x). 

Figure  5.3  shows  two  other  pictorial  ways  of  specifying  the  same  graph  as  in  Figure  5.2.  The 
drawings  look  very  different  but  exactly  the  same  set  V  and  function  ip  are  specified  in  each  case. 
It  is  very  important  that  you  understand  exactly  what  information  is  needed  to  completely  specify 
the  graph.  When  thinking  in  terms  of  cities  and  routes  between  them,  you  naturally  want  the 
pictorial  representation  of  the  cities  to  represent  their  geographical  positioning  also.  If  the  pictorial 
representation  does  this,  that's  fine,  but  it  is  not  a  part  of  the  information  required  to  define  a  graph. 
Geographical  location  is  extra  information.  The  geometrical  positioning  of  the  vertices  A,  B,  C  and 
D  is  very  different  in  Figures  5.2  and  5.3(a).  However,  in  each  of  these  cases,  the  vertices  on  a  given 


V  =  {A,B,C,D},    E  =  {a,b,c,d,e,f,g} 


and 


a  b  c  d  e  f  g 

{A,B}  {A,B}  {A,C}  {B,C}  {B,C}  {B,C}  {B,D} 
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Figure  5.3    Two  alternate  pictorial  specifications  of  Figure  5.2. 


edge  are  the  same  and  hence  the  graphs  specified  are  the  same.  In  Figure  5.3(b)  a  different  method 
of  specifying  the  graph  is  given.  There,  ip~^,  the  inverse  of  is  given.  For  example,  ip~^{{C,  B})  is 
shown  to  be  {d,  e,  /}.  Knowing  ^p^^  determines  and  hence  determines  G  since  the  vertices  A,  B,  C 
and  D  are  also  specified  in  Figure  5.3(b). 

Warning  Some  people  call  our  "simple  graph"  a  "graph"  and  some  people  call  our  "graph"  a 
"multigraph."  Still  other  people  mean  something  somewhat  different  by  the  term  multigraph! 

Example  5.3  Simple  graphs  are  graphs  Wc  can  easily  reconcile  our  two  definitions  by  real- 
izing that  a  simple  graph  is  a  special  case  of  a  graph.  Let  G  =  (V,  E)  be  a  simple  graph.  Define 
(fi-.E^Etohe  the  identity  map;  i.e.,  (p{e)  =  e  for  all  e  £  E.  The  graph  G'  =  {V,  E,  tp)  is  essentially 
the  same  as  G.  There  is  one  subtle  difference  in  the  pictures:  The  edges  of  G  are  unlabeled  but  each 
edge  of  G'  is  labeled  by  a  set  consisting  of  the  two  vertices  at  its  ends.  Q 

There  are  still  more  concepts  that  can  be  called  graphs.  People  may  not  be  interested  in  which 
road  is  which  in  Figure  5.2,  so  the  labels  on  the  edges  are  not  needed.  On  the  other  hand,  some 
people  might  need  more  information,  such  as  how  long  it  takes  to  travel  on  each  of  the  routes  in 
Figure  5.2.  There  may  be  other  situations,  too;  for  example,  some  of  the  routes  may  be  one  way.  We 
will  meet  some  of  these  concepts  later. 


Exercises 


5.1.1.  Let  (V,  E,  (fi)  be  a  graph  and  v  &  V  a,  vertex.  Define  the  degree  of  v,  d{v)  to  be  the  number  of  e  €  S 
such  that  V  £  f{e);  i.e.,  e  is  incident  on  v.  Prove  that  X^^^y  d{v)  =  2\E\,  an  even  number.  Conclude 
that  the  number  of  vertices  v  for  which  d{v)  is  odd  is  even. 

5.1.2.  We  are  interested  in  the  number  of  simple  graphs  with  V  =  n. 

(a)  Prove  that  there  are  2(2)  such  simple  graphs.  (That's  2  to  the  power  (2))  not  2(2)-) 

(b)  How  many  of  them  have  exactly  q  edges? 

(c)  If  we  choose  a  simple  n-vertex  graph  uniformly  at  random,  what  is  the  probability  that  it  has 
exactly  q  edges? 


5.1     What  is  a  Graph? 
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5.1.3.    Sometimes  it  is  useful  to  allow  an  edge  to  have  both  its  ends  on  the  same  vertex.  Let  Q  =  {V,  E,  ip) 

be  a  graph  where 

V  =  {A,B,C,D,E,F,G,C,H},    E  =  {a,h,c,d,e,  f,g,h,i,j,d} 

and 

(abcdefghijk\ 
aadeaebfgca\. 
bdebbgfgccaJ 


In  this  representation  of  ip,  the  first  row  specifies  the  edges  and  the  the  two  vertices  below  each  edge 
specify  the  vertices  incident  on  that  edge.  Here  is  a  pictorial  representation  P{Q)  of  this  graph. 

a  f 


P'iQ)  : 


Note  that  ip{k)  =  {A,  A}  =  {A}.  Such  an  edge  is  called  a  loop.  Adding  a  loop  to  a  vertex  increases 
its  degree  by  two.  The  vertex  H,  which  does  not  belong  to  ip{x)  for  any  edge  x  (i.e.,  has  no  edge 
incident  upon  it),  is  called  an  isolated  vertex.  The  degree  of  an  isolated  vertex  is  zero.  Edges,  such 
as  a  and  e  of  Q,  with  the  property  that  ip{a)  =  ip{e)  are  called  parallel  edges.  If  all  edge  and  vertex 
labels  are  removed  from  P{Q)  then  we  get  the  following  picture  P'{Q): 


P'iQ)  : 


The  picture  P'{Q)  represents  the  "form"  of  the  graph  just  described  and  is  sometimes  referred  to 
as  a  pictorial  representation  of  the  "unlabeled"  graph  associated  with  Q.  For  each  of  the  following 

graphs  R,  where  7?  —  {V,E,Lp),  V  —  {A,  B ,C,  D,  E,  F,G,C,  H},  draw  a  pictorial  representation  of 

R  by  starting  with  P'{Q)  removing  and/or  adding  as  few  edges  as  possible  and  then  labeling  the 
resulting  picture  with  the  edges  and  vertices  of  7?.  A  graph  R  which  require  no  additions  or  removals 
of  edges  is  said  to  be  "of  the  same  form  as"  or  "isomorphic  to"  the  graph  Q. 


(a)  Let  E  =  {a,  b,  c,  d,  e,  /,  g,  h,  i,  j,  k}  be  the  set  of  edges  of  R  and 

(abcdefghijk\ 
ccfaheeadaa] 
ggghhhfhgdfJ 


(b)  Let  E  =  {1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11}  be  the  set  of  edges  of  R  and 


1     2     345     6     7     8     9    10    11  \ 

aeeefghbcd  e  \ 
ghefghbcddhJ 
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5.1.4.  Let  Q  =  {y,E,ip)  be  a  graph  with  \V\  —  n.  Let  di,  d2,  ■  ■  ■ ,  dn,  where  di  <  d2  <  ■  ■  ■  <  dn  be 
the  sequence  of  degrees  of  the  vertices  of  Q,  sorted  by  size.  We  refer  to  this  sequence  as  the  degree 
sequence  of  the  graph  Q.  For  example,  if  Q  =  (V,  E,  ip)  is  the  graph  where 

V  =  {A,B,C,D,E,F,G,H},    E  =  {a,b,c,d,ej,g,h,i,j,k,l} 

and 

(abode  fghijkl\ 
AADEAEBFGCA  e\. 
bdebbgfgccagJ 

then  (0,  2,  2,  3,  4,  4, 4,  5)  is  the  degree  sequence  of  Q.  Consider  the  the  following  unlabeled  pictorial 

representation  of  Q 


(a)  Create  a  pictorial  representation  of  Q  by  labeling  P'{Q)  with  the  edges  and  vertices  of  Q. 

(b)  A  necessary  condition  that  a  pictorial  representation  of  a  graph  R  can  be  created  by  labeling 
P'iQ)  with  the  vertices  and  edges  of  R  is  that  the  degree  sequence  of  R  be  (0,2,2,3,4,4,4,5). 
True  of  false?  Explain. 

(c)  A  sufficient  condition  that  a  pictorial  representation  of  a  graph  R  can  be  created  by  labeling 
P'iQ)  with  the  vertices  and  edges  of  R  is  that  the  degree  sequence  of  R  be  (0,  2,  2, 3, 4, 4,  4, 5). 
True  or  false?  Explain. 

5.1.5.  In  each  of  the  following  problems  information  about  the  degree  sequence  of  a  graph  is  given.  In  each 
case,  decide  if  a  graph,  without  loops,  satisfying  the  specified  conditions  exists  or  not.  Give  reasons 

in  each  case. 

(a)  A  graph  Q  with  degree  sequence  (1, 1,  2,  3, 3,  5)? 

(b)  A  graph  Q  with  degree  sequence  (1,2,2,3,3,5),  multiple  (i.e.  parallel)  edges  allowed? 

(c)  A  simple  graph  Q  with  degree  sequence  (1,  2, 2,  3,  3,  5)? 

(d)  A  simple  graph  Q  with  degree  sequence  (3,  3,  3,  3)? 

(e)  A  graph  Q  with  degree  sequence  (3, 3,  3,  3),  no  loops  or  parallel  edges  allowed? 

(f)  A  graph  Q  with  degree  sequence  (3, 3,  3,  5),  no  loops  or  parallel  edges  allowed? 

(g)  A  graph  Q  with  degree  sequence  (4,4,4,4,4),  no  loops  or  parallel  edges  allowed? 

(h)  A  graph  Q  with  degree  sequence  (4,4,4,4,6),  no  loops  or  parallel  edges  allowed? 


5.2   Equivalence  Relations  and  Unlabeled  Graphs 


Sometimes  we  arc  interested  only  in  the  "structure"  of  a  graph  and  not  in  the  names  (labels)  of  the 
vertices  and  edges.  In  this  case  we  are  interested  in  what  is  called  an  unlabeled  graph.  A  picture  of 
an  unlabeled  graph  can  be  obtained  from  a  picture  of  a  graph  by  erasing  all  of  the  names  on  the 
vertices  and  edges.  This  concept  is  simple  enough,  but  is  difficult  to  use  mathematically  because  the 
idea  of  a  picture  is  not  very  precise. 

The  concept  of  an  equivalence  relation  on  a  set  is  an  important  concept  in  mathematics  and 
computer  science.  We  used  the  idea  in  Section  4.3,  but  did  not  discuss  it  much  there.  We'll  explore 
it  more  fully  here  and  will  use  it  to  rigorously  define  unlabeled  graphs.  Later  we  will  use  it  to  define 
connected  components  and  biconnected  components.  We  recall  the  definition  given  in  Section  4.3: 


5.2     Equivalence  Relations  and  Unlabeled  Graphs 


127 


Definition  5.3  Equivalence  relation  An  equivalence  relation  on  a  set  S  is  a  partition 
of  S.  We  say  that  s,t  G  S  are  equivalent  if  and  only  if  they  belong  to  the  same  block.  If  the 
symbol  ^  denotes  the  equivalence  relation,  then  we  write  s  ^  t  to  indicate  that  s  and  t  are 
equivalent. 

Exannple  5.4    To  refresh  your  memory,  we'll  look  at  some  simple  equivalence  relations. 

Let  S  be  any  set  and  let  all  the  blocks  of  the  partition  have  one  element.  Two  elements  of  S 
are  equivalent  if  and  only  if  they  are  the  same.  This  rather  trivial  equivalence  relation  is,  of  course, 
denoted  by  "=". 

Now  let  the  set  be  the  integers  Z.  Let's  try  to  define  an  equivalence  relation  by  saying  that  n 
and  k  are  equivalent  if  and  only  if  they  differ  by  a  multiple  of  24.  Is  this  an  equivalence  relation? 
If  it  is  we  should  be  able  to  find  the  blocks  of  the  partition.  There  are  24  of  them,  which  we  could 
number  0, . . . ,  23.  Block  j  consists  of  all  integers  which  equal  j  plus  a  multiple  of  24;  that  is,  they 
have  a  remainder  of  j  when  divided  by  24.  Since  two  numbers  belong  to  the  same  block  if  and 
only  if  they  both  have  the  same  remainder  when  divided  by  24,  it  follows  that  they  belong  to  the 
same  block  if  and  only  if  their  difference  gives  a  remainder  of  0  when  divided  by  24,  which  is  the 
same  as  saying  their  difference  is  a  multiple  of  24.  Thus  this  partition  does  indeed  give  the  desired 
equivalence  relation. 

Now  let  the  set  be  Z  x  Z*,  where  Z*  is  the  set  of  all  integers  except  0.  Write  (a,  6)  ~  {c,d)  if 
and  only  if  ad  =  be.  With  a  moment's  reflection,  you  should  see  that  this  is  a  way  to  check  if  the 
two  fractions  a/b  and  c/d  are  equal.  We  can  label  each  equivalence  class  with  the  fraction  a/b  that 
it  represents.  In  an  axiomatic  development  of  the  rationals  from  the  integers,  one  defines  a  rational 
number  to  be  just  such  an  equivalence  class  and  proves  that  it  is  possible  to  add,  subtract,  multiply 
and  divide  equivalence  classes.  We  won't  pursue  this.  Q 

In  the  next  theorem  we  provide  necessary  and  sufflcient  conditions  for  an  equivalence  relation. 
Verifying  the  conditions  is  a  useful  way  to  prove  that  some  particular  situation  is  an  equivalence 
relation.  Recall  that  a  binary  relation  on  a  set  is  a  subset  R  oi  S  x  S.  Given  a  binary  relation 
R,  we  will  write  s  ~  t  if  and  only  if  {s,t)  €  R.  Thus  is  another  way  to  represent  the  binary 
relation. 


Theorem  5.1  Equivalence  relations  Let  S  be  a  set  and  suppose  that  we  have  a  binary 
relation  ~  on  S.  This  is  an  equivalence  relation  if  and  only  if  the  following  three  conditions  hold. 

(i)  (Reflexive)  For  all  s  &  S  we  have  s  ~  s. 

(ii)  (Symmetric)  For  all  s,t  G  S  such  that  s  ~  f  we  have  f  ~  s. 

(Hi)  (Transitive)  For  all  r,s,t  G  S  such  that  r  ~  s  and  s  ^  t  we  have  r  ^  t. 
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Proof:  W('  first  prove  that  an  equivalence  relation  satisfies  (i)  (iii).  Suppose  tliat  ~  is  an  equiv- 
alence relation.  Since  s  belongs  to  whatever  block  it  is  in,  we  have  s  ~  s.  Since  s  ~  t  means  that 
s  and  t  belong  to  the  same  block,  we  have  s  ~  t  if  and  only  if  we  have  t  ^  s.  Now  suppose  that 
r  ~  s  -~  Then  r  and  s  are  in  the  same  block  and  s  and  t  are  in  the  same  block.  Thus  r  and  t  are 
in  the  same  block  and  so  r  ~ 

We  now  suppose  that  (i)-(iii)  hold  and  prove  that  we  have  an  equivalence  relation.  What  would 
the  blocks  of  the  partition  be?  Everything  equivalent  to  a  given  element  should  be  in  the  same  block. 
Thus,  for  each  s  e  6"  let  B{s)  be  the  set  of  alH  e  5  such  that  s  ~  We  must  show  that  the  set  of 
these  sets  form  a  partition  of  S. 

In  order  to  have  a  partition  of  S,  we  must  have 

(a)  every  i  €  5  is  in  some  B{s)  and 

(b)  for  every  p,q  €  S,  B{p)  and  B{q)  are  either  equal  or  disjoint. 

Since  ~  is  reflexive,  s  G  B{s),  proving  (a).  Suppose  x  G  B{p)  n  B{q)  and  y  S  B{p).  We  have,  p  ^  x, 
q  ^  X  and  p  ^  y.  Thus  q  ^  x  ^  p  ^  y  and  so  y  e  B{q),  proving  that  B{p)  C  B{q).  Similarly 
B{q)  C  B{p)  and  so  B{p)  =  B{q).  This  proves  (b).  □ 

Suppose  we  have  a  picture  of  a  graph  G  =  {y,E,ip),  with  the  elements  of  V  and  E  written 
next  to  the  appropriate  vertices  and  edges  in  the  picture.  Suppose  that  we  have  another  graph 
G'  =  {V',E',(f').  We  may  be  able  to  erase  the  elements  of  V  and  E  and  replace  them  with  elements 
of  V  and  E',  respectively,  so  that  we  obtain  a  picture  of  the  graph  G'.  If  this  is  possible,  we  will 
say  that  G  and  G'  are  isomorphic  graphs.  One  can  show  that  this  relation  {G  is  isomorphic  to 
G')  satisfies  the  conditions  of  Theorem  5.1  and  so  is  an  equivalence  relation.  All  graphs  which  are 
isomorphic  to  a  given  graph  correspond  to  the  same  unlabeled  graph.  The  following  definition  and 
example  formulate  these  ideas  more  precisely. 

Definition  5.4  Graph  isomorphism  Let  G  =  {V,E,tp)  and  G'  =  {V',E',ip')  be  graphs. 
We  say  G  and  G'  are  isomorphic,  written  G  ^  G'  if  there  are  bijections 

v.V^V    and  eiE^E' 

such  that  (p'{e{e))  =  v{(p{e))  for  all  e  &  E,  where  v{{x,y})  is  defined  to  be  {i'{x),i'{y)}. 

Let  G  =  (V,  E)  and  G'  =  {V,  E')  be  simple  graphs.  They  are  isomorphic  if  there  is  a  bijection 
v.V          such  that 

{u,  v}  G  E    if  and  only  if  G  E'. 


Let's  see  what  our  definition  means  intuitively.  Suppose  we  have  a  picture  of  a  graph  G  = 
(V,  E,  ip) ,  with  the  elements  of  V  and  E  written  next  to  the  appropriate  vertices  and  edges  in  the 
picture.  We  can  replace  the  vertex  set  by  a  new  set  V' ,  writing  the  elements  of  V  where  the  elements 
of  V  were.  The  same  thing  can  be  done  with  the  edges  and  a  new  set  E' .  This  defines  two  functions 
u:V  ^  V  and  e:E^  E' .  The  condition  ip'{e{e))  =  v{ip{e))  says  that  we  get  ip'  by  simply  looking 
at  the  picture  to  see  what  the  ends  of  an  edge  are. 
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VI 


V2 


V3 


V2{Vi)  ^  V2{V2) 


V2{Vi)^V2{V2)^V2{V:i) 


Figure  5.4  Functions  involved  in  proving  the  equivalence  of  graphs.  The  left  diagram  is  used  for  the 
reflexive  law.  The  right  diagram  is  used  for  the  transitive  law. 


Example  5.5    Unlabeled  graphs     Let  S  be  the  set  of  all  graphs  and  let  G  =  {V,E,ip)  and 

G'  =  (y ,  E' ,  ip')  be  in  S.  Wo  will  prove  that  graph  isomorphism  is  an  equivalence  relation  on  S. 

Before  we  give  our  proof,  let's  look  at  how  an  equivalence  class  can  be  interpreted.  Draw  a  picture 
of  G  and  then  erase  the  names  of  the  vertices  and  edges.  We  could  call  what  is  left  the  "structure"  of 
G  or,  as  is  more  commonly  done,  an  unlabeled  graph.  The  use  of  an  equivalence  relation  to  define  an 
unlabeled  graph  may  seem  a  bit  round-about:  Why  not  just  be  satisfied  with  the  picture?  Thinking 
about  the  picture  is  fine  for  thinking  about  unlabeled  graphs  and  for  the  proofs  we'll  write;  however, 
there  are  reasons  for  presenting  this  definition: 

•  If  we  wanted  to  use  the  picture  approach  in  a  more  formal  way,  we'd  have  to  say  precisely  what 
a  picture  of  a  graph  was  and  when  two  pictures  represented  the  same  graph.  We  faced  this 
problem  a  bit  in  the  previous  section. 

•  Equivalence  relations  arise  from  isomorphisms  in  many  areas  of  mathematics  and  there  is  often 
no  picture  that  one  can  use  to  describe  the  equivalence  relation.  Thus,  seeing  the  formalism  for 
graphs  is  good  preparation  for  seeing  it  in  other  courses. 

•  If  we  wanted  a  theorem-proving  computer  program  to  work  with  unlabeled  graphs,  we'd  need 
to  give  it  a  formal  definition.  (It  would  be  much  simpler  to  work  with  labeled  graphs.)  This  is 
an  example  of  how  an  intuitively  simple  concept — unlabeled  graphs  in  this  case — can  be  very 
diSicult  to  express  in  terms  that  computer  software  can  deal  with. 

We  now  give  a  formal  proof  that  we  have  an  equivalence  relation. 

First:  G  ^  G  because  we  can  take  v  and  e  to  be  the  identity  functions;  i.e.,  v{x)  —  x  for  all  x  e  T^. 

Second:  Given  that  G\  ~  G2,  we  must  prove  G2  ~  Gi,  the  reflexive  law.  Let  vx  and  e\  be 
the  bijections  guaranteed  by  the  definition  of  Gi  G2.  Then  i>i^:V2  — >  Vi  and  £j~^:£^2  — >  -E'l 
are  bijections.  Let  62  G  E2  and  ei  =  £^"^(62).  Then,  by  y2(£i(ei))  =  fi(yi(ei)),  we  have 
<Pi(ei)  =  z^f  ^((p2(ei(ei)))  and  so 

ipi{si\e2))  =  ifiiei)  =  z^f  ^(</)2(£i(ei)))  =  i^r^(<p2(e2)). 

Thus  G2  ~  Gi. 

Figure  5.4  may  help  you  follow  what  we've  just  done:  The  definition  of  Gi  ~  G2  says  that 
starting  at  e  G  i^i,  going  to  E2  via  ei  and  then  to  7^2 (V2)  via  (p2  ends  up  at  the  same  pair  of  vertices 
as  going  from  e  to  'P2{Vi)  and  then  to  P2{V2)  via  <pi  and  vi,  respectively.  In  other  words,  the  two 
routes  from  Ei  to  7^2(^2)  end  up  in  the  same  place.  We  proved  that  the  two  routes  from  E2  to 
V2{Vi)  must  then  also  end  up  in  the  same  place. 

Third:  We  must  prove  the  transitive  law.  Suppose  that  Gi  ~  G2  ~  G3.  We  then  have  the  bijections 

i^i'Vi  — >  Vi+i  and  Si'.Ei  -Bi+i  for  i  =  1,2.  Furthermore,  ipi^i{ei{ei))  —  Vi{ipi{ei))  for  all  S  Ei. 
Let  v{v\)  =  V2{vi{vi))  and  £(ei)  =  £2(£i(ei)).  It  is  easily  verified  that  v  and  £  are  bijections  since 
Vi  and  Si  are.  Finally,  since  £i(ei)  G  E2, 

</33(£(ei))  =  </93(£2(£i(ei)))  =  !^2(V2(£i(ei)))  =  z^2(«^i(vi(ei)))  =  i/(</9i(ei)). 

Thus  Gi  ~  G3.  □ 
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Exercises 

5.2.1.  Suppose  that  G  —  {V,  E,  ip)  and  G'  =  {V' ,  E' ,  tp')  are  equivalent;  i.e.,  give  the  same  unlabeled  graph. 
Prove  the  following. 

(a)  1^1  =  \V'\  and       =  \E'\. 

(b)  d{k,  G)  =  d{k,  G')  for  all  k,  where  d{k,  H)  is  the  number  of  vertices  of  degree  k  in  H. 

5.2.2.  From  our  discussion,  it  may  seem  to  be  an  easy  matter  to  decide  if  two  graphs  represent  the  same 
unlabeled  graph.  This  is  not  true  even  for  relatively  small  graphs.  Divide  the  following  graphs  into 
equivalence  classes  and  justify  your  answer;  i.e.,  explain  why  you  have  the  classes  that  you  do.  In  all 
cases  V  =  4. 

(a  b  c         d         e         f  \ 

{1,2}  {1,2}  {2,3}  {3,4}  {1,4}  {2,4}) 

[  A  B  C  D  E  F  \ 
^{1,2}  {1,4}  {1,4}  {1,2}  {2,3}  {3,4}  j 

(u         V         w         X         y         ^  \ 
{2,3}  {1,3}  {3,4}  {1,4}  {1,2}  {1,2}  ) 

fPQRSTU\ 
[{3,4}  {2,4}  {1,3}  {3,4}  {1,2}  {1,2}) 

5.2.3.  Let  M{n,  M)  be  the  n  x  n  matrices  over  the  real  numbers. 

(a)  For  two  matrices  A,B  £  M(n, R),  write  ^  ~  B  if  and  only  if  there  is  some  nonsingular 
P  G  M(n,  M)  such  that  B  =  PAP^^.  Prove  that  this  is  an  equivalence  relation. 

(b)  For  two  matrices  A,B  £  M(n,M),  write  A  ~  J3  if  and  only  if  there  is  some  nonsingular 
P  €  M{n,W}  such  that  B  =  PAP*,  where  P*  is  the  transpose  of  P.  Prove  that  this  is  an 
equivalence  relation. 

5.2.4.  Which  of  the  following  define  equivalence  relations?  If  an  equivalence  relation  is  not  defined,  why 
not? 

(a)  For  all  s  and  t,  s  ^  t. 

(b)  Among  the  students  at  a  university  who  have  selected  precisely  one  major,  two  are  equivalent  if 
and  only  if  they  have  the  same  major. 

(c)  Among  the  students  at  a  university,  two  students  are  equivalent  if  they  have  a  class  in  com- 
mon. 

(d)  For  the  real  numbers,  two  numbers  are  equivalent  if  they  differ  by  less  that  0.001. 

(e)  For  the  real  numbers,  two  numbers  are  equivalent  if  they  agree  in  their  decimal  expansions 
through  the  third  digit  after  the  decimal  place. 


(a)  ip  = 

(b)  <p  = 

(c)  ip  = 

(d)  ^  = 


*5.2.5.  Define  the  concept  of  a  equivalence  relation  for  simple  graphs  so  that  you  can  introduce  the  notion 
of  an  unlabeled  simple  graph.  Prove  that  you  have,  indeed,  defined  an  equivalence  relation. 
Hint.  You  need  only  introduce  v,  not  e. 


5.3     Paths  and  Subgraphs  131 


*5.2.6.  We  say  that  an  ordinary  function  f  E  B    has  its  domain  A  and  its  range  B  labeled.  For  convenience,we 

lot  ^  =  {1,  .  .  .  ,  a}  and  B  =  {!,..., b}. 

(a)  Suppose  that  f,g&  B^.  Write  /  ~  g  if  there  is  a  permutation  n  on  A  such  that  f{x)  =  g{Tr{x)) 
for  all  X  €  A.  Prove  that  this  is  an  equivalence  relation.  We  call  the  equivalence  class  a  function 

with  unlabeled  domain.  Prove  that  the  set  of  nondecreasing  functions  from  ^  to  B  is  a  system  of 
representatives  for  these  equivalence  classes;  that  is,  this  set  contains  exactly  one  function  from 
each  equivalence  class. 

(b)  Using  the  idea  in  (a),  define  the  notion  of  a  function  with  unlabeled  range  and  prove  that  you 
have  an  equivalence  relation.  Call  a  function  /:  A  — »  B  a  "restricted  growth  function"  if  /(I)  =  1 
and,  for  a  >  j  >  1,  f{j  +  1)  is  at  most  one  larger  than  the  maximum  of  /(I),  /(2),  . . . ,  /(j). 
Prove  that  the  restricted  growth  functions  form  a  system  of  representatives  for  the  equivalence 

classes  you  have  defined. 

(c)  Using  the  previous  ideas,  define  the  notion  of  a  function  with  unlabeled  domain  and  range  and 
prove  that  you  have  an  equivalence  relation.  Call  a  function  f:A—fBa  "partition  function" 

if  f{i)  <  /(i  +  1)  for  a  >  i  >  1  and  \f'-~'^\j)\  >  +  1)|  for  &  >  j  >  1.  Prove  that  the 

partition  functions  give  one  representative  from  each  equivalence  class 

*5.2.7.  This  problem  uses  the  ideas  and  notation  from  the  previous  problem.  Construct  a  table  with  four 
rows  marked  with  the  four  possibilities  "A  (un)labeled  and  B  (un)labeled"  and  with  the  columns 
marked  with  "all,"  "injections"  and  "surjections."  Each  of  the  twelve  positions  is  to  be  interpreted 
as  the  number  of  (equivalence  classes  of)  functions  in  B^  satisfying  the  conditions.  We  use  [^|  —  a 
and  \B\  =  b  and  use  U  and  L  to  indicate  labeled  and  unlabeled.  The  start  of  a  table  is  shown  below. 
Verify  these  entries  and  complete  the  table.  How  can  the  number  of  (equivalence  classes  of)  bijections 
be  found  from  the  table? 


A 

B 

all 

injections 

surjections 

L 

L 

? 

6(6  -  1)  •  •  •  (6  -  o  +  1) 

? 

L 

U 

Efc<6'S'(a,fe) 

? 

? 

U 

L 

? 

? 

{a-b) 

U 

U 

? 

1 

? 
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An  important  concept  for  describing  the  structure  of  a  graph  is  the  concept  of  a  path. 

Definition  5.5   Path,  trail,  walk  and  vertex  sequence  Let  G  =  {V,E,<fi)  be  a  graph. 

Let  ei,  62, . . . ,  e„_i  be  a  sequence  of  elements  of  E  (edges  of  G)  for  which  there  is  a  sequence 
ai,a2,  ■ .  ■  ,an  of  distinct  elements  of  V  (vertices  of  G)  such  that  ip{ei)  =  {ai,ai+i}  for  i  = 
1, 2, . . . ,  n  —  1.  The  sequence  of  edges  ei,  62, . . . ,  e„_i  is  called  a  path  in  G.  The  sequence  of 
vertices  0,1,  02, ... ,  «„,  is  called  the  vertex  sequence  of  the  path.  (Note  that  since  the  vertices 
are  distinct,  so  are  the  edges.)  If  we  require  that  ei, . . . ,  e„_i  be  distinct,  but  not  that  oi, . . . ,  a„ 
be  distinct,  the  sequence  of  edges  is  called  a  trail.  If  we  do  not  even  require  that  the  edges  be 
distinct,  it  is  called  a  walk. 
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Note  that  the  definition  of  a  path  requires  that  it  not  intersect  itself  (i.e.,  have  repeated  vertices), 
while  a  trail  may  intersect  itself.  Although  a  trail  may  intersect  itself,  it  may  not  have  repeated  edges, 
but  a  walk  may.  If  P  =  (ei, . . . ,  e„_i)  is  a  path  in  G  =  (V,  E,  ip)  with  vertex  sequence  ai, . . . ,  a„ 
then  we  say  that  P  is  a  path  from  ai  to  an-  Similarly  for  a  trail  or  a  walk. 

In  the  graph  of  Figure  5.2  (p.  123),  the  sequence  c,  d,g  is  a,  path  with  vertex  sequence  A,  C,  B,  D. 
If  the  graph  is  of  the  form  G  =  {V,  E)  with  E  C  'P2{V),  then  the  vertex  sequence  alone  specifies  the 
sequence  of  edges  and  hence  the  path.  Thus,  in  Figure  5.1  (p.  122),  the  vertex  sequence  MN,  SM, 
SE,  TM  specifics  the  path  {MN,  SM},  {SM,  SE},  {SE,  TM}. 

Note  that  every  path  is  a  trail  and  every  trail  is  a  walk,  but  not  conversely.  However,  we  can 
show  that,  if  there  is  a  walk  between  two  vertices,  then  there  is  a  path.  This  rather  obvious  result 
can  be  useful  in  proving  theorems,  so  we  state  it  as  a  theorem. 

Theorem  5.2    Suppose  u^v  are  vertices  in  G  =  {V,  E,  ip).  The  following  are  equivalent: 

(a)  There  is  a  waJJc  from  u  to  v. 

(b)  There  is  a  traiJ  from  u  to  v. 

(c)  There  is  a  path  from  u  to  v. 

Furthermore,  given  a  walk  from  u  to  v,  there  is  a  path  from  u  to  v  all  of  whose  edges  are  in  the 
walk. 

Proof:  Since  every  path  is  a  trail,  (c)  implies  (b).  Since  every  trail  is  a  walk,  (b)  implies  (a).  Thus 
it  suffices  to  prove  that  (a)  implies  (c).  Let  ei,  62, . . . ,  efc  be  a  walk  from  u  to  v.  We  use  induction  on 
n,  the  number  of  repeated  vertices  in  a  walk.  If  the  walk  has  no  repeated  vertices,  it  is  a  path.  This 
starts  the  induction  at  n  =  0.  Suppose  n  >  0.  If  u  and  or  v  is  repeated,  take  the  part  of  the  walk 
that  starts  at  the  last  occurrence  of  u  and  ends  at  the  first  occurrence  of  v,  since  this  walk  has  less 
than  n  repeated  vertices,  there  is  a  path.  Let  r  be  a  repeated  vertex  different  from  u  and  v.  Suppose 
it  first  appears  in  edge  ej  and  last  appears  in  edge  ej.  Then  ei, . . . ,  ej,  e^-, . . . ,  is  a  walk  from  u 
to  V  in  which  r  is  not  a  repeated  vertex.  Hence  there  are  less  than  n  repeated  vertices  in  this  walk 
from  u  to  f  and  so  there  is  a  path  by  induction.  Since  we  constructed  the  path  by  removing  edges 
from  the  walk,  the  last  statement  in  the  theorem  follows.  Q 

Another  basic  notion  is  that  of  a  subgraph  of  G  =  (V,  E,  ip),  which  we  will  soon  define.  First  we 
need  some  terminology  about  functions.  By  a  restriction  ip'  of  (p  to  E'  C  E,  we  mean  the  function 
(p'  with  domain  E'  and  satisfying  ip'(x)  =  ^(x)  for  all  x  &  E' . 

Definition  5.6  Subgraph  Let  G  =  {V,E,(p)  he  a  graph.  A  graph  G'  =  {V',E',ip')  is  a 
subgraph  of  G  ifV  C  V,  ip{e')  €  V2{V')  for  all  el  €  E',  and  ip'  is  the  restriction  of  ip  to  E' 
having  range  V^iV). 

The  fact  that  G'  is  itself  a  graph  means  that  ip{x)  e  ^2(^0  for  each  x  G  E' . 

Example  5.6  For  the  graph  G  =  {V,  E,  ip)  of  Figure  5.2  (p.  123),  Let  G'  =  {V,  E' ,  <^')  be  defined 
by  V  =  {A,  B,  G},  E'  =  {a,  b,  c,  /},  and  by  ip'  being  the  restriction  of  ip  to  E'  with  range  T2iV'). 
Notice  that  (p'  is  determined  completely  from  knowing  V,  E'  and  ip.  Thus,  to  specify  a  subgraph 
G',  the  key  information  is  V'  and  E' . 

As  another  example  from  Figure  5.2,  we  let  V  =  V  and  E'  =  {a,  b,  c,  /}.  In  this  case,  the  vertex 
D  is  not  a  member  of  any  edge  of  the  subgraph.  Such  a  vertex  is  called  an  isolated  vertex  of  G'.  Q 
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One  way  of  specifying  a  subgraph  is  to  give  a  set  of  edges  E'  C  E  and  take  V  to  be  the  set  of 
aU  vertices  on  some  edge  of  E' .  In  other  words  ,  V  is  the  union  of  the  sets  (fi{x)  over  all  x  G  E'. 
Such  a  subgraph  is  called  the  subgraph  induced  by  E' .  The  first  of  Examples  5.6  is  the  subgraph 
induced  by  E'  —  {a,  6,  c,  /}.  Likewise,  given  a  set  V'  C  V,  we  can  take  E'  to  be  the  set  of  all  edges 
X  G  E  such  that  (p{x)  C  V' .  The  resulting  subgraph  is  called  the  subgraph  induced  by  V' .  Referring 
to  Figure  5.2  (p.  123),  the  edges  of  the  subgraph  induced  by  V  =  {C,  B},  are  E'  =  {d,  e,  /}. 

Look  again  at  Figure  5.2.  In  particular,  consider  the  path  c,  a  with  vertex  sequence  C,A,B. 
Notice  that  the  edge  d  has  (p{d)  =  {C\B}.  The  subgraph  G'  =  {V',E',Lp'),  where  V  =  {C,A,B} 
and  E'  =  {c,  a,  d}  is  called  a  cycle  of  G.  In  general,  whenever  there  is  a  path  in  G,  say  ei, . . . ,  Cn-i 
with  vertex  sequence  ai, . . . ,  a„,  and  an  edge  x  with  f{x)  —  {ai,  a„},  then  the  subgraph  induced  by 
the  edges  ei, . . . ,  e„_i,  a;  is  called  a  cycle  of  G.  The  formal  definition  is: 

Definition  5.7  Cycle  Let  G  =  {V,E,Lp)  be  a  graph  and  let  ei, . . .  ,e„_i  be  a  path  with 
vertex  sequence  ai, . . . ,  a„.  If  x  is  an  edge  of  G  such  that  ^{x)  =  {ai,  an},  then  the  subgraph 
G'  of  G  induced  by  the  set  of  edges  {ei, . . . ,  e„_i,  a;}  is  called  a  cycle  of  G.  The  length  of  tie 
cycle  is  n. 

In  our  definitions,  a  path  is  a  sequence  of  edges  but  a  cycle  is  a  subgraph  of  G.  In  actual  practice, 
wc  will  not  need  to  make  such  fine  distinctions,  so  we  may  think  of  a  cycle  as  a  path,  except  that 
it  starts  and  ends  at  the  same  vertex.  Cycles  are  closely  related  to  the  existence  of  unique  paths 
between  vertices: 

Theorem  5.3  Two  vertices  u  are  on  a  cycle  of  G  if  and  only  if  there  are  two  paths  from 
u  to  V  that  have  no  vertices  in  common  except  the  endpoints  u  and  v. 

Proof:  Suppose  u  and  v  are  on  a  cycle.  Follow  the  cycle  in  some  direction  from  u  to  to  obtain 
one  path.  Then  follow  the  cycle  in  the  opposite  direction  from  w  to  v  to  obtain  another.  Since  a  cycle 

has  no  repeated  vertices,  the  only  vertices  that  lie  in  both  paths  arc  u  and  v.  On  the  other  hand,  a 
path  from  u  to  v  followed  by  a  path  from  to  u  is  a  cycle  if  the  paths  have  no  vertices  in  common 
other  than  u  and  v.  D 

Definition  5.8  Connected  graph  Let  G  =  {V,E,ip)  be  a  graph.  If  for  any  two  distinct 
elements  u  and  vofV  there  is  a  path  P  from  utov  then  G  is  a  connected  graph.  If\V\  =  1, 
then  G  is  connected. 

We  make  two  observations  about  the  definition. 

•  Because  of  Theorem  5.2,  we  can  replace  "path"  in  the  definition  by  "walk"  or  "trail"  if  we  wish. 

•  The  last  sentence  in  the  definition  is  not  really  needed.  To  see  this,  suppose  \V\  =  1.  Now  G  is 
connected  if,  for  any  two  distinct  elements  u  and  v  of  V,  there  is  a  path  from  u  to  v.  This  is 
trivially  satisfied  since  wc  cannot  find  two  distinct  elements  in  the  one  clement  set  V. 

The  graph  of  Figure  5.1  (p.  122)  is  not  connected.  (There  is  no  path  from  EN  to  TM,  for 
example.)  The  subgraph  of  this  graph  induced  by  the  edges  {{SH,  EN},  {EN,  RL},  {EN,  CS}}  is  a 
connected  graph  with  no  cycles.  Notice  in  Figure  5.1,  that  the  relation  defined  on  pairs  of  vertices  u,  v 
by  "there  exists  a  path  from  u  to  v"  partitions  the  vertices  into  two  subsets:  Vi  =  {EN,  SH,  RL,  CS} 
and  V2  =  {MN,  SM,  SE,  TM}.  Any  two  vertices  in  Vi  can  be  joined  by  a  path  and  the  same  is  true 
for  any  two  vertices  in  V2 .  There  is  no  path  connecting  a  vertex  in  Vi  to  a  vertex  in  V2  ■ 

This  is  the  case  in  general  for  a  graph  G  =  (V,  E,  (p):  The  vertex  set  is  partitioned  into  subsets 
Vi,  V2, . . . ,  Vm  such  that  if  u  and  v  are  in  the  same  subset  then  there  is  a  path  from  u  to  v  and 
if  they  are  in  different  subsets  there  is  no  such  path.  The  subgraphs  Gi  =  (Vi,  Ei,(pi), . . . ,  Gm  = 
(Vm,  Em,  'Pm)  induced  by  the  sets  Vi, . . . ,  Vm  are  called  the  connected  components  of  G.  Every  edge 
of  G  appears  in  one  of  the  connected  components.  To  see  this,  suppose  that  {u,  v}  is  an  edge  and 
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note  that  the  edge  is  a  path  from  m  to  u  and  so  u  and  v  are  in  the  same  induced  subgraph,  Gj.  By 
the  definition  of  induced  subgraph,  {u,v}  is  in  Gj. 

*Exannple  5.7   Connected  components  as  an  equivalence  relation   If  you've  read  Section  2, 

you  may  have  rcahzcd  that  the  definition  of  connected  components  is  a  bit  sloppy:  We  need  to  know 
that  the  partitioning  into  such  subsets  can  actually  occur.  To  see  that  this  is  not  trivially  obvious, 
define  two  integers  to  be  "connected"  if  they  have  a  common  factor.  Thus  2  and  6  are  connected  and 
3  and  6  arc  connected,  but  2  and  3  are  not  connected  and  so  wc  cannot  partition  the  set  V  =  {2,  3,  6} 
into  "connected  components" .  We  must  use  some  property  of  the  definition  of  graphs  and  paths  to 
show  that  the  partitioning  of  vertices  is  possible.  One  way  to  do  this  is  to  construct  an  equivalence 
relation. 

For  u,v  &  V,  write  w  ~  ii  if  and  only  if  either  u  =  v  ov  there  is  a  walk  from  u  to  v.  We  will  use 
Theorem  5.1  (p.  127)  to  prove  that  this  is  an  equivalence  relation.  It  is  clear  that  ~  is  reflexive  and 
symmetric.  We  now  prove  that  it  is  transitive.  Let  u  ^  v  ^  w.  The  walk  from  u  to  v  followed  by 
the  walk  from  to  w  is  a  walk  from  uio  w.  This  completes  the  proof  that  it  ~  w  is  an  equivalence 
relation.  The  relation  partitions  V  into  subsets  Vi, . . . ,  V^.  Q 

Exercises 


5.3.1.  Let  C  be  the  set  of  courses  at  the  university  and  S  the  set  of  students.  Let  V  =  C  \J  S  and  let 
{s,  c}  G  S  if  and  only  if  student  s  is  enrolled  in  course  c. 

(a)  Prove  that  G  =  {V,  E)  is  a  simple  graph. 

(b)  Prove  that  every  cycle  of  G  has  an  even  number  of  edges. 

5.3.2.  A  graph  G  =  (V,  E)  is  called  bipartite  if  V  can  be  partitioned  into  two  sets  A  and  B  such  that  each 
edge  has  one  vertex  in  A  and  one  vertex  in  _B.  (A  partition  means  that  A\J  B  =  V  and  Ar\B  =  0.) 

(a)  Prove  that  the  example  in  Exercise  5.3.1  is  a  bipartite  graph. 

(b)  Prove  that  every  cycle  in  a  bipartite  graph  has  even  length. 

(c)  Suppose  that  G  is  a  connected  bipartite  graph.  Develop  an  algorithm  to  partition  the  vertices 
of  G  into  sets  A  and  B  such  that  each  edge  has  one  vertex  in  A  and  one  vertex  in  B.  Prove  that 
your  algoritiim  is  correct. 

(d)  Extend  your  algorithm  to  all  bipartite  graphs. 

(e)  Prove  that  the  number  of  ways  to  choose  A  and  B  in  a  bipartite  graph  with  k  connected 

u 

components  is  2  . 

*(f)  Prove  that  a  graph  is  bipartite  if  and  only  if  every  cycle  in  the  graph  has  even  length. 

*5.3.3.  A  cut  edge  or  isthmus  of  a  connected  graph  G  =  {V,  E,  ip)  is  an  edge  e  such  that  the  removal  of  e 
from  G  leaves  a  graph  which  is  not  connected.  A  cut  vertex  or  articulation  point  of  G  is  a  vertex  v 
such  that  the  subgraph  induced  hy  V  —  {v}  is  not  connected.  For  example,  in  Figure  5.2,  edge  g  is 
an  isthmus  and  vertex  i?  is  a  cut  vertex.  For  this  problem,  assume  that  G  is  simple. 

(a)  If  e  is  a  cut  edge  of  G  and  v  €  ip{e)  is  also  on  another  edge  /  ^  e,  prove  that  v  is  a  cut  vertex 
of  G. 

(b)  Give  an  example  of  a  connected  graph  that  has  a  cut  vertex  but  does  not  have  an  isthmus. 

(c)  Prove  that  an  edge  e  of  G  is  a  cut  edge  if  and  only  if  it  does  not  lie  on  a  cycle. 
Hint.  Look  for  a  path  that  does  not  contain  e  but  connects  the  two  vertices  in  ip{e). 

(d)  ( Challenge)  Formulate  and  prove  a  result  similar  to  the  previous  one  for  cut  vertices. 


5.3     Paths  and  Subgraphs  135 


5.3.4.  A  circuit  (or  "closed  trail")  in  a  graph  G  —  {V,  E,  if)  is  defined  exactly  as  is  a  cycle  in  Definition  5.7 
except  that  the  "path  with  vertex  sequence  ai, .  . . ,  On"  is  replaced  by  a  "trail  with  vertex  sequence 
ai, .  .  .  ,an-"  In  the  next  section,  we'll  define  a  tree  as  a  connected  graph  without  cycles.  Suppose 
that  in  this  definition  and  in  Definition  5.8  "path" is  replaced  by  "trail"  and  "cycle"  is  replaced  by 
"circuit."  Would  the  new  definitions  of  tree  and  connected  graph  describe  the  same  structures  as  the 
old  definition?  Explain. 

5.3.5.  We  are  going  to  describe  a  process  for  constructing  a  graph  G  —  {V,  E,  ip)  (with  loops  allowed).  Start 
with  V  —  {vi}  consisting  of  a  single  vertex  and  with  _E  =  0.  Add  an  edge  ei,  with  (fi{ei)  =  {vi,V2}, 
to  E.  If  vi  =  V2,  we  have  a  graph  with  one  vertex  and  one  edge,  else  we  have  a  graph  with  two 
vertices  and  one  edge.  Keep  track  of  the  vertices  and  edges  in  the  order  added.  Here  {vi,V2)  is  the 
sequence  of  vertices  in  the  order  added  and  the  (ei)  is  the  sequence  of  edges  in  order  added.  Suppose 
we  continue  this  process  to  construct  a  sequence  of  vertices  (not  necessarily  distinct)  added  and 
sequence  of  distinct  edges  added.  At  the  point  where  k  distinct  edges  have  been  added,  if  v  is  the  last 
vertex  added,  then  we  add  a  new  edge  e/;_|_i,  different  from  all  previous  edges,  with  ¥'(e/j_|_i)  =  {v,  v'} 
where  either  v'  is  a  vertex  already  added  or  a  new  vertex.  Here  is  a  picture  of  this  process  carried 
out  with  the  edges  numbered  in  the  order  added,  where  the  vertex  sequence  is 


{A,  A,  B,  E,  D,  A,  B,  F,  G,  E,  C,  C,  G) 
2  9 


P'iQ)  ■■ 


Such  a  graph  is  called  Eulerian  or  a  "graph  with  an  Eulerian  trail."  By  construction,  if  G  is  a  graph 
with  an  Eulerian  trail,  then  there  is  a  trail  in  G  that  includes  every  edge  in  G.  If  there  is  a  circuit  in 
G  that  includes  every  edge  of  G  then  G  is  called  an  Eulerian  circuit  graph  or  graph  with  an  Eulerian 
circuit.  Thinking  about  the  above  example,  if  a  graph  has  an  Eulerian  trail  but  no  Eulerian  circuit, 
then  all  vertices  of  the  graph  have  even  degree  except  the  start  vertex  and  end  vertex  of  the  Eulerian 
trail  (they  have  odd  degree).  If  a  graph  has  an  Eulerian  circuit  then  all  vertices  have  even  degree. 
The  converses  in  each  case  are  also  true  (but  take  a  little  work  to  show).  In  each  of  the  following 
graphs,  find  the  longest  trail  (most  edges)  and  longest  circuit.  If  the  graph  has  an  Eulerian  circuit  or 
trail,  say  so. 
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5.3.6.  Suppose  wc  start  with  a  graph  G'  —  {V,  E' ,  ip')  that  is  a  cj'clc  and  then  add  additional  edges,  without 
adding  any  new  vertices,  to  obtain  a  graph  G  =  (V,  E,  ip).  As  an  example,  consider 


h 

A  f 

/  i 

1 

1      b  \ 

where  the  first  graph  G'  —  {V,  E' ,  tp')  is  the  cycle  induced  by  the  edges  {a,  b,  c,  d,  e,  /}.  A  graph  that 
can  be  constructed  from  such  a  two-step  process  is  called  a  Harniltoman  graph.  The  cycle  G'  is 
called  a  Hamiltonian  cycle.  Alternatively,  a  cycle  in  a  graph  G  =  {V,  E,  ip)  is  a  Hamiltonian  cycle 
for  G  if  every  element  of  F  is  a  vertex  of  the  cycle.  A  graph  G  =  {V,  E,  (p)  is  Hamiltonian  if  it  has 
a  subgraph  that  is  a  Hamiltonian  cycle  for  G.  For  each  of  the  following  graphs  G  =  (V,  E,  ip),  find  a 
cycle  in  G  of  maximum  length.  State  whether  or  not  the  graph  is  Hamiltonian. 
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Trees  play  an  important  role  in  a  variety  of  algorithms.  We  have  already  met  decision  trees  in 
Chapter  3.  In  this  section,  we  define  trees  precisely  and  look  at  some  of  their  properties.  We  study 
trees  further  in  Section  6.1  and  Chapter  9. 

Definition  5.9  (Free)  Tree  If  G  is  a  connected  graph  without  any  cycles  then  G  is  called 
a  tree.  (If \V\  =  1,  then  G  is  connected  and  hence  is  a  tree.)  A  tree  is  also  called  a  free  tree. 
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The  graph  of  Figure  5.2  (p.  123)  is  connected  but  is  not  a  tree.  The  subgraph  of  this  graph  induced 
by  the  edges  {a,  e,  g}  is  a  tree.  If  G  is  a  tree,  then  ip  is  an  injection  since  if  ei  ^  62  and  (p{ei)  =  '-pie-i)-, 
then  {ei,  62}  induces  a  cycle.  Because  of  this,  we  can  think  of  a  tree  as  a  simple  graph  when  we  are 
not  interested  in  names  of  the  edges. 

It's  natural  to  ask  how  many  trees  can  be  formed  using  an  n-set  V  for  the  vertices.  In  Exam- 
ple 5.10  (p.  143),  we'll  prove  that  the  answer  is  n"~^.  Another  proof  is  given  in  Exercise  5.4.12. 

Since  the  notion  of  a  tree  is  so  important,  it  will  be  useful  to  have  some  equivalent  definitions 
of  a  tree.  We  state  them  as  a  theorem 

Theorem  5.4    Definitions  of  tree    If  G  is  a  connected  graph,  the  following  are  equivalent. 

(a)  G  is  a  tree. 

(b)  G  has  no  cycles. 

(c)  For  every  pair  of  vertices  u  ^  v  in  G,  there  is  exactly  one  path  from  u  to  v. 

(d)  Removing  any  edge  from  G  gives  a  graph  which  is  not  connected. 

(e)  The  number  of  vertices  of  G  is  one  more  than  the  number  of  edges  of  G. 


Proof:    By  the  definition  of  a  tree,  (a)  and  (b)  are  equivalent. 

Theorem  5.3  can  be  used  to  prove  that  (b)  and  (c)  are  equivalent.  We  leave  that  as  an  exercise. 

If  {u,v}  is  an  edge,  it  follows  from  (c)  that  the  edge  is  the  only  path  from  u  to  v  and  so 
removing  it  disconnects  the  graph.  Hence  (c)  implies  (d).  We  leave  it  as  an  exercise  to  prove  that 
(d)  implies  (b).  This  shows  that  (a),  (b),  (c),  and  (d)  are  all  equivalent. 

All  that  remains  is  (e). 

We  first  show  that  (b)  implies  (e).  We  will  use  induction  on  the  number  of  vertices  of  G.  If  G 
has  one  vertex,  it  has  no  edges  and  so  we  are  done.  Otherwise,  we  claim  that  G  has  a  vertex  u  of 
degree  1;  that  is,  it  hes  on  only  one  edge  {u,w}.  We  prove  this  claim  shortly.  Remove  u  and  {u,v} 
to  obtain  a  graph  H  with  one  less  edge  and  one  less  vertex.  Since  G  is  connected  and  has  no  cycles, 
the  same  is  true  of  H.  Since  H  has  fewer  vertices  than  G,  the  induction  hypothesis  tells  us  that  (e) 
is  true  for  H:  there  is  one  more  vertex  than  edge  in  H.  Since  H  was  obtained  from  G  by  removing 
one  edge  and  one  vertex,  (e)  is  true  for  G.  It  remains  to  prove  the  existence  of  u.  Suppose  no  such 
u  exists;  that  is,  suppose  that  each  vertex  lies  on  at  least  two  edges.  We  will  derive  a  contradiction. 
Start  at  any  vertex  vi  of  G  leave  vi  by  some  edge  ei  to  reach  another  vertex  V2.  Leave  V2  by  some 
edge  62  different  from  the  edge  used  to  reach  V2.  Continue  with  this  process.  Since  each  vertex  lies 
on  at  least  two  edges,  the  process  never  stops.  Hence  we  eventually  repeat  a  vertex,  say 

vi,ei,V2, . . .  ,Vk,ek,  ■ .  ■  ,Vn,  e„,t;„+i  =  Vk- 

The  edges  e^, . . . ,  e„  form  a  cycle,  which  is  a  contradiction. 

Now  suppose  G  is  a  connected  graph  which  is  not  a  tree.  It  suffices  to  prove  that  G  has  at  least 
as  many  edges  as  it  has  vertices.  Why?  If  we  do  so,  we  will  have  shown 

((a)  is  false)    implies    ((e)  is  false) 

and  hence  the  contrapositive  ((e)  is  true)  implies  ((a)  is  true).  On  with  the  proof!  By  (d)  we  can 
remove  an  edge  from  G  to  get  a  new  graph  which  is  still  connected.  If  this  is  not  a  tree,  repeat  the 
process  and  keep  doing  so  imtil  we  reach  a  tree.  Since  (a)  implies  (b)  and  (b)  implies  (e),  the  number 
of  vertices  is  now  one  more  than  the  number  of  edges.  Since  we  removed  edges  from  G  but  did  not 
remove  vertices,  G  must  have  at  least  as  many  edges  as  vertices.  Q 
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Example  5.8  Symmetry  in  graphs  and  trees  Let  G  =  {V,E)  be  a  simple  graph.  Suppose 
V  -.V  \s  &  isomorphism  of  G  to  G.  Graph  isomorphism  is  defined  in  Definition  5.4  (p.  128).  We 
call  an  isomorphism  from  something  to  itself  an  "endomorphism"  or  a  "symmetry."  How  much  can 
a  symmetry  move  the  vertices  of  a  graph? 

It  turns  out  that  most  graphs  have  only  the  trivial  endomorphism  v(y)  =  v  for  all  v  €  V ,  so 
the  vertices  can't  move  at  all.  On  the  other  hand,  if  the  graph  has  no  edges  [E  =  0)  or  all  possible 
edges  {E  =  V2{V)),  then  every  permutation  of  the  vertices  in  V  is  a  symmetry  since  the  condition 
{m,  v}  &  E  \i  and  only  if  {i'(u),v(v)}  &  E  m.  Definition  5.4  is  easily  seen  to  hold.  What  about  graphs 
that  are  encountered  in  practice? 

The  most  common  graphs  in  computer  science  are  trees.  Most  trees  have  symmetries.  For  ex- 
ample, suppose  {u,v},  {u,w}  G  E  and  v  and  w  are  leaves  in  a  tree  T  =  {y,E).  (Note  that  V  may 
have  many  more  vertices  besides  u,  v  and  w.)  We  leave  it  for  you  to  verify  that 

{w,  if  a;  =  V, 
V,  if  X  =  w, 
X,  otherwise, 

is  an  isomorphism  of  T.  Only  the  vertices  v  and  w  moved.  While  all  vertices  might  move  in  an 
isomorphism  of  a  tree,  there  are  always  some  that  either  don't  move  or  don't  move  "far."  We'll 
prove 

Theorem  5.5     If  v  is  an  isomorphism  of  the  tree  T  =  {V,E)  then  either 

(a)  there  is  a  vertex  v  with  i/{v)  =  v  or 

(b)  there  is  an  edge  {u,v}  with  i'{u)  =  v  and  i'{v)  =  u. 

In  other  words,  either  a  vertex  or  an  edge  does  not  move. 

Suppose  (a)  is  not  true.  We'll  define  a  map  f  :  V  E  and  use  the  Pigeonhole  Principle, 
Theorem  2.5  (p.  55). 

li  X  €  V,  i'{x)  7^  X.  Since  T  is  a  tree,  there  is  a  unique  path  from  x  to  ^{x).  Let  f{x)  be  the 
first  edge  on  the  path.  Since  =  |F|  —  1,  the  Pigeonhole  Principle  tells  us  that  there  must  be  two 
vertices  Vi  and  V2  with  f{vi)  =  /(f2)-  Thus  vi  and  V2  are  the  ends  of  an  edge  e  =  {wi,  W2}.  Since  v 
is  an  isomorphism,  i^{e)  =  {i/(t'i),  i^(v2)}  is  also  an  edge. 

We  claim  i/{e)  =  e.  Draw  a  picture  and  try  to  understand  why  this  is  so.  (Remember  that  paths 
between  tree  vertices  are  unique.) 

*        *        *       Stop  and  think  about  this!        *        *  * 

If  z/(e)  ^  e,  here  is  a  way  to  get  from  vi  to  V2  without  using  e.  There  is  a  path  Pj  from  Vi  to  ^{vi) 
that  starts  with  the  edge  e.  Start  at  vi  and  follow  the  part  of  P2  after  e  to  z^(wi).  Traverse  the  edge 
j/(e)  =  {iy{vi),  i'{v2)}  from  z^(fi)  to  i^{v2).  Since  e  =  {ui,  V2}  is  the  first  edge  on  P2,  you  can  follow 
Pi  backwards  until  you  reach  V2-  You  have  just  walked  from  vi  to  V2  without  using  e.  Thus  there  is 
a  path  from  vi  to  V2  that  does  not  use  e.  Since  there  can  be  only  one  path  from  vi  to  V2,  namely  e, 
our  assumption  that  z/(e)  ^  e  must  be  wrong.  Defining  u  =  vi  and  v  =  V2  completes  the  proof.  Q 

The  decision  trees  in  Chapter  3  have  some  special  properties.  First,  they  have  a  starting  point. 
Second,  the  edges  (decisions)  out  of  each  vertex  are  ordered.  We  now  formalize  these  concepts. 

Definition  5.10  Rooted  graph  A  pair  {G,v),  consisting  of  a  graph  G  and  a  specified  vertex 
V,  is  called  a  rooted  graph  with  root  v. 

A  rooted  tree  is  sometimes  simply  referred  to  as  a  tree,  but  we  will  never  do  so. 
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Figure  5.5    A  rooted  plane  tree  with  root  a  at  tlie  top,  as  usual.  Linear  ordering  of  siblings  is  left  to  right. 


Definition  5.11  Parent,  child,  sibling  and  leaf  Let  {T,r)  be  a  rooted  tree.  Ifw  is  any 
vertex  other  than  r,  let  r  =  vo,vi, ...  ,Vk,  Vk+i  =  w,  be  the  unique  path  from  r  to  w.  We  call  Vk 
the  parent  of  w  and  call  w  a  child  of  Vk.  Vertices  with  the  same  parent  are  siblings.  A  vertex 
with  no  children  is  a  leaf. 

Definition  5.12  Rooted  plane  tree  Let  {T.r)  be  a  rooted  tree.  For  each  vertex,  order  the 
children  of  the  vertex.  The  result  is  a  rooted  plane  tree,  which  we  abbreviate  to  RP-tree. 
An  RP-tree  is  also  called  a  tree.  Parents  and  children  are  also  called  fathers  and  sons. 

Figure  5.5  shows  an  RP-tree.  The  sons  of  b  are  e  and  /,  the  parent  of  g  is  d,  vertex  k  has  no  children, 
and  the  siblings  of  h  are  g  and  i.  The  decision  trees  of  Chapter  3  are  RP-trees.  When  we  draw  a 
tree  as  in  Figure  5.5,  the  root  is  normally  the  topmost  vertex  and  all  edges  arc  directed  downward. 
In  addition,  siblings  are  drawn  from  left  to  right  following  their  ordering.  For  example,  the  ordering 
of  the  siblings  {j,  I,  k}  is  j,  then  k  and  then  I. 

It's  clear  where  "rooted"  comes  from  in  RP-trce,  but  where  docs  "plane"  come  from?  When 
such  a  tree  is  drawn  on  a  piece  of  paper  (a  plane),  we  can  start  with  the  root,  list  its  children  below 
it  in  order,  and  so  on — ^just  like  the  picture  of  a  decision  tree.  On  the  other  hand,  any  rooted  tree 
drawn  in  the  plane  has  a  natural  ordering  for  the  children  of  a  vertex  v  provided  v  is  not  the  root: 
Starting  at  the  edge  from  the  parent  of  v,  walk  counterclockwise  around  v,  listing  the  children  of  v 
in  the  order  in  which  their  edges  are  met. 

We  now  have  two  different  concepts  that  are  referred  to  as  trees:  free  trees  (Defi- 
nition 5.9)  and  RP-trees.  How  will  we  keep  them  straight?  When  we  believe  there 
is  no  chance  of  confusion,  we  may  call  them  trees;  otherwise  will  we  call  them  free 
trees  and  RP-trees.  Sometimes  the  distinction  is  not  needed.  For  example,  by  Theo- 
rem 5.4  the  statement  that  a  tree  has  no  cycles  applies  equally  well  to  RP-trees  and 
free  trees  since  every  RP-tree  is  simply  a  free  tree  with  a  root  and  orderings. 

In  Chapter  9  we'll  discuss  some  recursive  aspects  of  RP-trees  including  tree  traversal  and  gram- 
mars. 

Exercises 


5.4.1.  Complete  the  proof  of  Theorem  5.4. 

5.4.2.  Let  G  =  (V,  E,  ip)  be  a  connected  graph.  Using  Theorem  5.4  and  its  proof,  do  the  following. 

(a)  Prove  that  G  has  a  cycle  if  and  only  if  there  is  some  edge  e  such  that  the  subgraph  of  G  with 
vertices  V  and  edges  E  —  {e}  is  connected. 

(b)  Prove  that  there  is  a  subgraph  of  G  which  is  a  tree  and  has  vertex  set  V.  Such  a  tree  is  called  a 

spanning  tree. 

(c)  Prove  that  \V\  <  \E\  +  1  with  equality  if  and  only  if  G  is  a  tree. 
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5.4.3.  Let  T  =  {V,  E)  be  a  tree  and  let  d{v)  be  the  degree  of  a  vertex  as  defined  in  Exercise  5.1.1. 

(a)  Prove  that  X]„gy(2  -  d(v))  =  2. 
Hint.  See  Exercise  5.1.1. 

(b)  Prove  that,  if  T  has  a  vertex  of  degree  m  >  2,  then  it  has  at  least  m  vertices  of  degree  1.  Vertices 
of  degree  1  are  called  leaves  or  terminal  vertices. 

(c)  Give  an  example  for  all  m  >  2  of  a  tree  with  a  vertex  of  degree  m  and  only  m  leaves. 

(d)  Suppose  that  T  has  at  most  one  vertex  of  degree  2.  Prove  that  over  half  the  vertices  of  T  are 

leaves. 

(e)  Give  an  example  for  all  m  >  0  of  a  tree  with  m  leaves,  m  —  1  other  vertices  and  at  most  one 
vertex  of  degree  2. 

5.4.4.  In  this  exercise,  we  study  how  counting  edges  and  vertices  in  a  graph  can  establish  that  cycles  exist. 

(a)  Using  induction  on  n,  prove: 

If  n  >  0,  a  connected  graph  with  v  vertices  and  v  +  n  edges  has  at  least  n  +  1  cycles. 

(b)  Prove  that  a  graph  with  v  vertices,  e  edges  and  c  components  has  at  least  c  +  e  —  v  cycles. 
Hint.  Use  (a)  for  each  component. 

(c)  Show  that  (a)  is  best  possible,  even  for  simple  graphs.  In  other  words,  for  each  n  construct  a 
simple  graph  that  has  n  more  edges  than  vertices  but  has  only  n+1  cycles. 

5.4.5.  Prove  that  every  tree  with  at  least  3  vertices  has  a  cut  vertex  and  a  cut  edge.  (The  terms  cut  edge 
and  cut  vertex  are  defined  in  Exercise  5.3.3.) 

5.4.6.  Give  an  example  of  a  graph  that  satisfies  the  specified  condition  or  show  that  no  such  graph  exists. 

(a)  A  tree  with  six  vertices  and  six  edges 

(b)  A  tree  with  three  or  more  vertices,  two  vertices  of  degree  one  and  all  the  other  vertices  with 

degree  three  or  more. 

(c)  A  disconnected  simple  graph  with  10  vertices,  8  edges  and  a  cycle. 

(d)  A  disconnected  simple  graph  with  12  vertices,  11  edges  and  no  cycles. 

(e)  A  tree  with  6  vertices  and  the  sum  of  the  degrees  of  all  vertices  12. 

(f)  A  connected  simple  graph  with  6  edges,  4  vertices,  and  exactly  2  cycles. 

(g)  A  simple  graph  with  6  vertices,  6  edges  and  no  cycles. 

5.4.7.  The  height  of  a  rooted  tree  is  the  the  length  of  the  longest  path  from  a  leaf  of  the  tree  to  the  root 
of  the  tree.  A  rooted  tree  in  which  each  non-leaf  vertex  has  at  most  two  children  is  called  a  binary 
tree.  If  each  non-leaf  vertex  has  exactly  two  children,  the  tree  is  called  a  full  binary  tree. 

(a)  Show  that  if  a  binary  tree  has  I  leaves  and  height  h  then  /  <  2'*,  or,  equivalently,  log2(0  <  h. 

(b)  Given  that  a  binary  tree  has  I  leaves,  what  can  you  say  about  the  maximum  value  of  hi 

(c)  Given  a  full  binary  tree  with  I  leaves,  what  is  the  maximum  height  hi 

(d)  Given  a  full  binary  tree  with  /  leaves,  what  is  the  minimum  height  hi 

(e)  Given  a  binary  tree  of  /  leaves,  what  is  the  minimal  height  hi 

5.4.8.  Prove  that  a  full  binary  tree  with  n  leaves  has  a  total  of  2n  —  1  vertices.  (The  concept  of  a  full  binary 
tree  was  defined  in  Exercise  5.4.7.) 

Challenge.  How  many  different  proofs  can  you  find? 

5.4.9.  In  each  of  the  following  cases,  state  whether  or  not  such  a  tree  is  possible. 

(a)  A  binary  tree  with  35  leaves  and  height  100. 

(b)  A  full  binary  tree  with  21  leaves  and  height  21. 

(c)  A  binary  tree  with  33  leaves  and  height  5. 

(d)  A  full  binary  tree  with  65  leaves  and  height  6. 

5.4.10.  What  is  the  maximal  number  of  vertices  in  a  rooted  tree  of  height  h  if  every  vertex  has  at  most  k 
children.  What  is  the  maximal  number  of  leaves  in  a  rooted  tree  of  height  h  if  every  vertex  has  at 
most  k  children? 
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5.4.11.  Wo  arc  going  to  define  certain  important  lists  of  vertices  associated  with  the  rooted  plane  tree, 
Figure  5.5.  These  lists  can,  in  a  similar  fashion,  be  associated  with  any  rooted  plane  tree.  The  first 
list  is  abcdef ghijkl.  This  list,  called  the  breadth-first  vertex  list,  is  obtained  by  starting  at  the  root 
and  reading  the  vertices  of  the  tree,  left  to  right,  as  one  would  read  a  book.  For  this  definition  to 
work  on  any  tree,  you  must  imagine  the  tree  drawn  with  the  root  at  the  top  and  all  vertices  distance 
one  from  the  root  drawn  at  the  next  level  down,  all  of  distance  two  at  the  next  level,  etc.  The 
second  important  list  is  called  the  depth-first  vertex  list.  The  depth-first  vertex  list  for  Figure  5.5 
is  abebfjfkflfbacadgdhdida.  Note  that  each  vertex  appears  in  the  depth-first  vertex  list  a  number 
of  times  equal  to  one  plus  the  number  of  children  of  that  vertex.  If  one  extracts  the  sublist  of  first 
occurrences  of  each  vertex  in  the  depth  first  list,  one  gets  the  list  abefjklcdghi.  This  list  is  called  the 
pre-order  vertex  list.  If  one  extracts  the  sublist  of  last  occurrences  of  each  vertex  in  the  depth  first 
list,  one  gets  the  list  ejklfbcghida  This  list  is  called  the  post-order  vertex  list. 

(a)  For  the  following  rooted  plane  tree,  list  the  breadth- first,  depth-first,  pre-order,  and  post-order 
vertex  lists. 


M 


(b)  Given  that  the  following  is  the  depth-first  vertex  list  of  a  rooted  plane  tree,  reconstruct  the  tree: 
MKBKLKMIHDHGHFHIEICIM. 

(c)  Is  it  possible  to  reconstruct  a  rooted  plane  tree  given  just  its  pre-order  vertex  list? 

(d)  Is  it  possible  to  reconstruct  a  rooted  plane  tree  given  its  pre-order  vertex  list  and  its  post-order 

vertex  list? 

*5.4.12.  Construct  a  bijection  between  functions  from  n  —  2  to  n  and  trees  with  V  =  nas  follows.  Repeatedly 
remove  the  leaf  with  the  largest  label  from  the  tree  until  only  a  two  vertex  tree  remains.  When  a  leaf 
is  removed,  list  the  vertex  that  it  was  attached  to.  This  is  called  the  Priifer  sequence  for  the  tree.  To 
establish  the  bijection,  you  must  prove  that  any  list  of  length  n  —  2  chosen  from  n  with  repetition 

allowed  corresponds  is  the  Priifer  sequence  of  a  tree. 

Hint.  Show  that  the  largest  number  not  in  the  Priifer  sequence  is  the  vertex  that  was  first  removed. 
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In  the  next  two  sections,  we  take  a  look  at  two  important  modifications  of  the  concept  of  a  graph. 

Look  again  at  Figure  5.2  (p.  123).  Imagine  now  that  the  symbols  a,  b,  c,  d,  e,  f  and  g,  instead 
of  standing  for  route  names,  stand  for  commodities  (applesauce,  bread,  computers,  etc.)  that  are 
produced  in  one  town  and  shipped  to  another  town.  In  order  to  get  a  picture  of  the  flow  of  com- 
modities, we  need  to  know  the  directions  in  which  they  are  shipped.  This  information  is  provided 
by  Figure  5.6. 

In  set  theoretic  terms,  the  information  needed  to  construct  Figure  5.6  can  be  specified  by  giving 
a  pair  D  =  {V,ip)  where  (fi  is  a.  function  with  domain  E  =  {a,b,c,d,e,f,g}  and  range  V  x  V. 
Specifically, 

/a  b  c  d  e  /  9  \ 

^   ^    \  (B.A)  {A,B)  (C,A)  {C,B)  {B,C)  {C,B)  (D,B) )  ' 

The  structure  given  in  Figure  5.6  is  an  example  of  a  directed  graph: 


142 


Chapter  5    Basic  Concepts  in  Graph  Theory 


Definition  5.13  Directed  graph  A  directed  graph  for  digraphj  is  a  trip7eD  =  i^) 
where  V  and  E  are  Gnite  sets  and  ip  is  a  function  with  domain  E  and  range  V  xV .  We  call  E 
the  set  of  edges  of  the  digraph  D  and  call  V  the  set  of  vertices  of  D. 

Note  that  it  is  possible  that  f{x)  =  {v,v)  for  v  €  V.  Such  an  edge  x  is  called  a  loop.  A  sim,ple 
digraph.,  like  a  simple  graph,  is  a  pair  (V,  E)  where  E  (ZV  yiV .  Subgraphs,  paths,  walks,  trails  and 
cycles  in  directed  graphs  are  analogous  to  the  corresponding  structures  in  a  graph.  A  directed  graph 
D'  =  (V,  E',  (p')  is  a  directed  subgraph  of  D  =  {V,  E,  ip)  if  V  CV,  E'  C  E  and  ^'  is  the  restriction 
of  ip  to  E'  with  range  V  x  V .  A  directed  path  in  the  digraph  D  =  {V,  E,  ip)  is  a  sequence  of  edges 
ei, . . . ,  e„_i  for  which  there  is  a  sequence  of  distinct  vertices  ai, . . . ,  a„  such  that  </'(ej)  =  (oi,  aj+i) 
for  i  ~  l,2,...,n  —  1.  The  subdigraph  induced  by  a  set  of  edges  E'  C  E  or  the  set  of  vertices 
V  C  V  is  defined  in  a  way  analogous  to  the  corresponding  concept  for  graphs.  Let  D  =  {V,  E,  ip) 
be  a  directed  graph  and  let  ei, . . . ,  e„_i  with  vertex  sequence  ai, . . . ,  o„  be  a  directed  path.  If  x  is 
an  edge  of  D  such  that  p{x)  =  (a„,  ai),  then  the  subgraph  induced  by  the  edges  {ei, . . . ,  e„_i,  x}  is 
called  a  directed  cycle  in  D.  For  example,  the  subgraph  induced  by  the  edges  {c,  b,  e}  is  a  directed 
cycle  in  the  digraph  of  Figure  5.6.  The  notions  of  edge  labeling  and  vertex  labeling  extend  directly  to 
digraphs.  The  ideas  of  connected  components  and  trees  are  more  complicated  in  digraphs. 

Example  5.9  Digraphs  and  binary  relations  Simple  digraphs  appear  in  mathematics  under 
another  important  guise:  binary  relations.  A  binary  relation  on  a  set  V  is  simply  a  subset  ofV  xV. 
Often  the  name  of  the  relation  and  the  subset  arc  the  same.  Thus  we  speak  of  the  binary  relation 
E  C  V  X  V.  li  you  have  absorbed  all  the  terminology,  you  should  be  able  to  see  immediately  that 
{V,  E)  is  a  simple  digraph  and  that  any  simple  digraph  iV' ,  E')  corresponds  to  a  binary  relation 
E'  CV  X  v. 

What  about  simple  graphs?  We  can  identify  {u,v}  €  V^iV)  with  {u,v)  G  V  xV  and  with 
{v,  u)  &V  xV.  K  binary  relation  R  is  called  "symmetric"  if  (u,  v)  G  R  implies  {v,  u)  G  R.  Thus 
it  seems  that  simple  graphs  correspond  to  symmetric  binary  relations.  This  is  not  quite  true  since 
{u,u)  would  correspond  to  a  loop.  We  must  either  allow  loops  or  only  look  at  symmetric  binary 
relations  that  do  not  contain  {u,u). 

All  equivalence  relation  on  a  set  S  is  also  a  binary  relation  R  C  S  x  S:  We  have  (.;:.  jj)  €  i?  if  and 
only  if  X  and  y  are  equivalent.  Note  that  this  is  a  symmetric  relationship.  Which  simple  graphs  (with 
loops  allowed)  correspond  to  equivalence  relations?  There  is  a  simple  description  of  them.  We'll  let 
you  look  for  it. 

Functions  from  a  set  to  itself  are  another  special  case  of  binary  relations.  Figure  5.7  shows  a 
function  and  its  associated  digraph,  called  a  functional  digraph.  Notice  that  some  of  the  vertices 
form  cycles  and  the  remaining  vertices  form  trees  that  are  attached  to  the  cycles,  each  tree  being 
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1         2^  9 


Figure  5.7    The  functional  digraph  associated  with  the  function  (p  =  (553734542)- 


Figure  5.8  The  doubly  marked  digraph  built  from  the  functional  digraph  of  Figure  5.7.  Removing  the 
arrows  gives  a  doubly  marked  tree. 


attached  by  one  of  the  vertices  on  a  cycle.  For  example,  the  vertices  1,  2,  6,  and  9  form  a  tree  that 
is  attached  to  a  cycle  by  the  vertex  6. 

One  can  easily  read  powers  (using  composition,  not  multiplication)  of  a  function  from  the  func- 
tional digraph;  The  function  itself  comes  from  directed  paths  of  length  1  and  (p'^  comes  from  directed 
paths  of  length  k. 

Permutations  are  a  special  case  of  functions.  Yoti  should  be  able  to  see  that,  in  this  case,  the 
functional  digraph  consists  of  cycles  with  no  trees  attached.  D 

Example  5.10  The  number  of  labeled  trees  Let  i„  be  the  number  of  trees  with  vertex  set 
n.  It's  not  hard  to  draw  the  possible  trees  for  small  values  of  n.  If  you  do  this,  you  should  discover 
that  ti  =  1,  t2  =  1,  t3  =  3  and  t4  =  16.  What  is  the  pattern? 

It  turns  out  that  tn  =  n""^.  You  might  try  to  check  this  for  =  125.  How  can  we  prove  this 
formula? 

When  we  know  the  answer  to  a  problem,  we  can  often  use  some  backwards  reasoning  or  "answer 
analysis"  to  figure  out  how  we  might  solve  the  problem.  Since  is  the  number  of  functions 

from  n  —  2  to  n,  we  might  try  to  find  a  bijection  between  such  functions  and  trees.  (This  is  done  in 
Exercise  5.4.12.)  Unfortunately,  it  is  not  at  all  clear  how  to  proceed  with  this  idea. 

It  would  be  much  nicer  if  we  looked  at  functions  from  n  to  n  because  these  lead  to  functional 
digraphs:  with  the  function  /  e  n—,  we  associate  the  functional  digraph  {V,  E)  where  V  =  n  and 
E  =  {{x,  f{x))  I X  e  n}.  Let's  pursue  this  and  see  if  we  can  generate  some  ideas. 

Look  at  the  functional  digraph  in  Figure  5.7.  If  we  could  somehow  get  rid  of  the  cycles,  we  would 
have  trees! 

Some  inspiration  is  needed.  Here  it  is.  The  vertices  on  the  cycles  are  a  permutation  drawn  in 

cyclic  form.  For  example,  in  Figure  5.7,  the  permutation  is  (3)(4,  7,  6).  Maybe  wc  can  draw  the 
permutation  in  some  other  fashion.  We  can  also  write  permutations  in  two  line  or  one  line  form: 

(3467\         3  7  4  6 
We  could  simply  take  the  one  line  form  and  construct  a  directed  graph  from  it: 

3-*  7-*  4-*  6 

We  could  also  drag  along  the  other  vertices  that  form  trees  attached  to  these  cyclic  vertices.  This  is 
shown  in  Figure  5.8,  where  you  should  ignore  the  circles  around  the  vertices  for  the  time  being.  We 
have  constructed  a  tree! 

This  has  one  problem,  we  can  no  longer  tell  which  vertices  were  on  the  cycles.  The  circles  in 
Figure  5.8  take  care  of  this,  the  double  circle  indicating  the  start  of  one  line  form  and  the  single 
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circle  indicating  the  end.  The  unique  path  from  the  end  to  the  start  gives  the  one  line  notation, 
written  in  reverse.  (The  directed  path  is  unique  because  we  have  constructed  a  tree.) 

Actually  our  picture  has  more  information  than  we  need:  The  direction  of  the  edges  is  determined 
by  the  fact  that  they  are  all  directed  toward  the  root,  which  is  3  in  our  example.  Thus  we  can  erase 
the  arrowheads  on  the  edges  with  no  loss  of  information. 

This  entire  process  is  reversible — Given  any  tree  on  n  in  which  one  vertex  is  marked  with  a 
double  circle  and  one  with  a  single  circle,  we  can  recover  a  unique  function  which  gives  this  marked 
tree.  Note  that  the  same  vertex  may  have  the  double  circle  and  the  single  circle.  This  happens  when 
there  is  only  one  point  of  the  functional  digraph  on  cycles.  How  many  such  doubly  marked  trees  are 
there?  By  the  bijection,  there  are  n".  Since  we  form  a  tree  AND  mark  a  vertex  with  a  double  circle 
AND  mark  a  vertex  with  a  single  circle,  the  number  is  also  t„  x  n  x  n.  This  completes  the  proof.  D 

Exercises 


5.5.1.  If  L)  =  {V,(p)  is  a  loopless  directed  graph,  the  associated  graph  G{D)  is  obtained  by  removing  the 
directions  of  the  edges.  Instead  of  this  rough  geometric  description,  give  a  definition  in  terms  of  sets 
and  functions. 

Hint.  Define  a  function  witii  domain  {{x,y)  \x,y  aV  and  x  7^  y}. 

5.5.2.  We  arc  interested  in  the  number  of  simple  digraphs  with  V  =  n. 

(a)  Find  the  number  of  them. 

(b)  Find  the  number  of  them  with  no  loops. 

(c)  In  both  cases,  find  the  number  of  them  with  exactly  q  edges. 

5.5.3.  An  oriented  simple  graph  is  a  simple  graph  which  has  been  converted  to  a  digraph  by  assigning  an 

orientation  to  each  edge.  The  orientation  of  {u,  v}  can  be  thought  of  as  a  mapping  of  it  to  either 
{u,v)  or  {v,u).  Give  an  example  of  a  simple  digraph  that  has  no  loops  but  is  not  an  oriented  simple 
graph 

5.5.4.  We  are  interested  in  the  oriented  simple  graphs  with  V  =  n.  (They  are  defined  in  Exercise  5.5.3.) 

(a)  Find  the  number  of  them. 

(b)  Find  the  number  of  them  with  exactly  q  edges. 

5.5.5.  Define  an  equivalence  relation  on  digraphs  that  allows  you  to  introduce  the  notion  of  unlabeled 
digraphs.  Prove  that  you  have,  in  fact,  defined  an  equivalence  relation. 

5.5.6.  A  digraph  is  strvngly  connected  if,  for  every  two  vertices  v  and  w  there  is  a  directed  path  from  v 
to  w.  Prom  any  digraph  D,  we  can  construct  a  simple  graph  S{D)  on  the  same  set  of  vertices  by 
letting  {v,  w}  be  an  edge  of  S{D)  if  and  only  if  at  least  one  of  {u,  v)  and  {v,  u)  is  an  edge  of  D.  You 
should  find  the  first  three  parts  of  this  exercise  easy,  if  you  understand  the  meaning  of  the  various 
concepts — strongly  connected,  path,  S{D),  etc. 

(a)  Prove  that,  if  D  is  strongly  connected,  then  S{D)  is  connected. 

(b)  Construct  an  example  of  a  digraph  D  such  that  D  is  not  strongly  connected  but  S{D)  is  con- 
nected. 

(c)  Suppose  that  Vi,  V2  is  a  partition  of  the  vertices  F  of  a  strongly  connected  digraph  D;  that  is, 
Vi  ^  0,  V2  7^  0,  Vi  U  V2  =  V,  and  Vi  n  1^2  =  0-  Prove  that  in  D  there  is  an  edge  from  Vi  to  V2 
and  an  edge  from  V2  to  Vi.  (This  means  that  for  some  t;i,  wi  €  Vi  and  V2,'W2  €  V2,  both  (t;i,  V2) 
and  (w2,  wi)  are  edges  of  D.)  We  will  call  such  a  partition  of  V  "2-way  joined." 

(d)  Suppose  that  every  partition  Vi,  V2  of  the  vertices  of  D  is  2-way  joined.  Prove  that  D  is  strongly 
connected. 
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5.5.7.  Suppose  that  we  arc  given  a  set  of  statements,  for  example  (a)  through  (e)  in  Theorem  5.4,  and  that 
we  have  proved  that  some  statements  imply  others.  Construct  a  directed  graph  D  as  follows.  The 
statements  are  the  vertices  V  of  D.  For  statements  v  and  w,  {v,  w)  is  an  edge  of  D  if  and  only  if  we 
have  a  proof  that  statement  v  implies  statement  w.  Prove  the  claim:  "If  D  is  strongly  connected,  we 
have  proved  enough  to  show  that  the  statements  in  V  are  all  equivalent."  (See  Exercise  5.5.6  for  a 
definition  of  strongly  connected.) 

5.5.8.  For  any  subset  U  of  the  vertices  y  of  a  directed  graph  D,  define  din([/)  to  be  the  number  of  edges 
of  e  of  D  with  if{e)  of  the  form  {w,  u)  where  u  £  U  and  w  ^  U.  Define  dout{U)  similarly. 

(a)  For  V  a  single  vertex,  what  is  d{^{{v})  in  terms  of  the  picture  of  D? 

(b)  Prove  that  ^  din{{v})  =  ^  dout({w}),  where  the  sums  both  range  over  all  v  SV. 

(c)  Prove  that  X^^g^/  din({M})  equals  di„(f7)  plus  the  number  of  non-loop  edges  of  D  that  have  both 
of  their  endpoints  in  U. 

(d)  Suppose  that  din{{v})  =  dout{{v})  for  all  v  €  V.  Prove  that  din{U)  =  dout{U)  for  all 
UCV. 

5.5.9.  Use  the  notation  of  Exercise  5.5.8.  Suppose  din({f })  =  dout({v})  for  all  v  SV.  Prove  that  for  every 
edge  e  of  D  there  is  a  directed  cycle  in  D  that  contains  e. 

5.5.10.  Suppose  that  S{D)  is  connected,  where  S(D)  is  obtained  from  D  by  removing  the  directions  of  the 
edges.  Use  the  notation  of  Exercise  5.5.8.  Suppose  din({v})  =  dout{{v})  for  all  v  £  V.  Prove  that 
there  is  a  directed  trail  that  contains  every  edge  of  D.  Such  a  trail  is  called  an  Eulerian  trail. 

5.5.11.  Let  G  be  a  connected  simple  graph. 

(a)  Suppose  that  the  edges  of  G  can  be  directed  so  that  the  resulting  digraph  is  strongly  connected. 

Prove  that  G  has  no  isthmuses. 

*(b)  Suppose  that  G  hcis  no  isthmuses.  Prove  that  the  edges  of  G  can  be  directed  so  that  the  resulting 
graph  is  strongly  connected.  (This  seems  to  be  quite  difficult  given  the  material  you  have  had 
so  far.  We  will  return  to  it  later.) 

5.5.12.  Let  -R  C  5  X  S  be  a  binary  relation  on  S.  Suppose  that  l^l  =  n. 

(a)  How  many  reflexive  binary  relations  R  are  there  on  SI 

(b)  How  many  reflexive  and  symmetric  relations  R  are  there  on  S"? 

(c)  The  relation  S  is  unreflextve  if  for  all  x  £  S,  {x,  x)  ^  R.  How  many  unrefiexive,  symmetric  binary 
relations  R  are  there  on  S? 

(d)  How  many  symmetric  relations  R  on  S  are  not  reflexive? 

5.5.13.  A  binary  relation  i?  on  S  is  an  order  relation  if  it  is  reflexive,  antisymmetric,  and  transitive.  7?  is 
antisymmetric  if  for  all  {x,y)  £  R  with  x  ^  y,  {y,x)  ^  R.  Given  an  order  relation  R,  the  covering 
relation  H  oi  R  consists  of  all  (x,  z)  £  R  such  that  there  is  no  y,  x  <  y  <  z,  where  {x,  y)  £  R.  A 
pictorial  representation  of  the  covering  relation  as  a  directed  graph  is  called  a  "Hasse  diagram"  of  H. 

(a)  Show  that  the  divides  relation  on  5  =  {2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}  is  an  order 
relation.  By  deflnition,  (x,  y)  is  in  the  divides  relation  on  5*  is  a:  is  a  factor  of  y.  Thus,  (4, 12)  is 
in  the  divides  relation.  x\y  is  the  usual  notation  for  x  is  a  factor  of  y. 

(b)  Draw  the  directed  graph  of  the  covering  relation  of  R. 

5.5.14.  Let  _R  be  a  binary  relation  on  S.  Let  Tjj  be  the  smallest  transitive  relation,  R  C  Tn.  By  "smallest" 
we  mean  that  if  i?  C  A''  C  then  N  is  not  transitive.  In  other  words,  there  is  no  proper  subset  of 
Tr  that  contains  R  and  is  transitive.  Note  that  if  R  is  already  transitive  then  =  R.  Tji  is  called 
the  transitive  closure  of  R.  Let  S  =  {2,  3, 4,  5, 6,  7, 8,  9, 10, 11, 12, 13, 14, 15, 16}  and  let 

H  =  {(2, 4),  (2, 6),  (2, 10),  (2, 14),  (3,  6),  (3, 9),  (3, 15),  (4, 8),  (4, 12),  (5, 10),  (5, 15),  (6, 12),  (7, 14)}. 

Find  the  transitive  closure  of  H. 
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5.5.15.    A  graph  is  selected  uniformly  at  random  from  all  g-edgc  simple  graphs  with  vertex  set  n. 

(a)  What  is  the  probability  that  the  graph  we  have  chosen  is  a  tree?  (Your  answer  will  depend 
on  q.) 

(b)  Show  that  this  probability  is  bigger  than  (2/e)"~'^/n  when  q  =  n  —  1. 


*5.6   Computer  Representations  of  Graphs 


What  is  the  best  way  to  represent  a  graph  in  a  computer?  That  question  is  based  on  the  mistaken 
assumption  that  there  is  one  best  way.  In  fact,  there  are  a  variety  of  ways  to  represent  a  graph. 
We'll  briefly  discuss  two  common  ones:  adjacency  lists  and  matrices.  For  simplicity,  we  will  limit  our 
attention  to  simple  graphs  and  digraphs. 

Let  G  =  {V,  E)  be  a  graph.  For  each  v  &  V,  keep  a  list  of  those  x  &  V  such  that  {u,  x}  G  E. 
This  is  a  relatively  compact  method  for  storing  the  structure  of  G.  The  actual  implementation  may 
be  with  an  array  or  with  a  linked  list.  \{  D  =  (F,  E)  is  a  simple  digraph,  then  we  keep  two  lists  for 
each  V  &      one  of  those  x  G  V  such  that  (x,  v)  €  E  and  one  of  those  y  E  V  such  that  {v,y)  €  E. 

The  method  of  linked  lists  is  usually  used  with  RP-trecs.  If  the  number  of  possible  sons  a  vertex 
can  have  is  not  limited,  a  variation  of  this  method  is  frequently  used.  Here's  one  such  variation.  Each 
vertex  v  has  a  list  of  four  vertices.  If  a  vertex  needed  in  the  list  does  not  exist,  that  fact  is  recorded 
in  the  list  (e.g.,  by  using  zero).  The  four  vertices  are: 

•  the  parent  /  of  the  vertex  v; 

•  the  first  child  of  v  in  the  ordering  of  the  children  of  v; 

•  the  sibling  of  v  that  immediately  precedes  it  in  the  ordering  of  the  siblings  of  v; 

•  the  sibling  of  v  that  immediately  follows  it  in  the  ordering  of  the  siblings  of  v. 

Matrices  arc  simply  doubly  indexed  arrays.  In  the  most  common  representation  of  a  simple 
graph  G  =  (n,  E),  the  matrix  A{G)  is  n  x  n  and 


This  representation  can  waste  a  considerable  amount  of  space  if  the  graph  has  relatively  few  edges.  In 
any  event,  only  about  half  of  A{G)  is  needed  since  aij  =  aj^i.  The  matrix  representation  is  useful  for 
some  calculations  of  numbers  related  to  the  structure  of  the  graph.  See,  for  example.  Exercise  5.6.3. 


5.6.1.  Suppose  that  G  is  a  bipartite  simple  graph.  (See  Exercise  5.3.2  for  a  definition.)  Prove  that  the 

vertices  can  be  numbered  1  through  n  =  |V|  such  that  for  some  k  the  matrix  A{G)  has  &  k  x  k  block 
of  zeroes  in  its  upper  left  corner  and  an  (n  —  fc)  x  (n  —  k)  block  of  zeroes  in  its  lower  right  corner. 

5.6.2.  Suppose  that  G  is  a  simple  graph  with  cormected  components  Gi, . . . ,  Gm-  Number  the  vertices  of  G 
by  first  numbering  those  in  Gi ,  then  those  in  G2  and  so  on.  Suppose  that  Gj  has  nj  vertices.  Provide 
a  description  of  A{G)  like  that  in  Exercise  5.6.1. 


1 0  otherwise. 

For  a  simple  digraph  D  =  (n,  E),  we  make  a  minor  adjustment  to  define  A{D): 
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5.6.3.  This  exercise  requires  familiarity  with  matrix  multiplication.  Let  G  =  (n,  E)  be  a  simple  graph.  Let 
a^^^  be  the  element  of  the  matrix  {A{G))^ .  Define  a  walk  from  i  to  j  like  a  path,  but  allow 
repetitions;  i.e.,  a  walk  is  a  sequence  ei,  62, . . . ,  e„_i  and  a  sequence  oi  vi, . . .  ,Vn  of  vertices  such 
that  vi  =  i,  Vn  =  j  and  v'(ei)  =  {vi,  t^i+i}.  The  length  of  the  walk  is  n  —  1,  the  number  of  edges  it 
contains,  with  repetitions  counted. 

(k) 

(a)  Prove  that  there  is  a  walk  of  length  k  from  i  to  j  in  G  if  and  only  if     J  7^  0. 

(b)  Suppose  that  i  ^  j.  If     ^     0  for  some  fc  >  0,  let  m  be  the  smallest  such  k.  Prove  that  there  is 
a  path  of  length  m  between  i  and  j. 

(c)  Can  you  find  an  analog  of  the  previous  result  for  cycles?  Be  careful! 

(d)  Prove  that  G  is  connected  if  and  only  if  {A{G)  + 1)^  contains  no  zeroes  for  all  sufficiently  large 
k.  Prove  that  all    >  n  —  1  are  sufficiently  large. 

Hint.  With  B  =  A{G)  + 1  and  A{Gf  =  I,  prove  that  6^^^^'  =  Y!1=q  (t)«ij- 

5.6.4.  State  and  prove  results  like  those  in  the  previous  exercise  for  directed  simple  graphs. 

5.6.5.  A  matrix  is  called  nilpotent  if  all  sufficiently  high  powers  of  it  consist  entirely  of  zeroes.  Find  necessary 
and  sufficient  conditions  on  a  simple  digraph  D  for  A{D)  to  be  nilpotent. 

Notes  and  References 
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CHAPTER  6 

A  Sampler 

of 

Graph  Topics 


Introduction 


A  tree  is  a  very  important  type  of  graph.  For  this  reason,  we've  devoted  quite  a  bit  of  space  to 
them  in  this  text.  In  Chapter  3,  wo  used  decision  trees  to  study  the  listing,  ranking  and  imranking 
of  functions  and  to  briefly  study  backtracking.  In  the  next  section,  we'll  focus  on  "spanning  trees." 
Various  types  of  spanning  trees  play  important  roles  in  many  algorithms;  however,  we  will  barely 
touch  these  applications. 

"Graph  coloring"  problems  have  been  studied  by  mathematicians  for  some  time  and  there  arc  a 
variety  of  interesting  results,  some  of  which  we'll  discuss.  The  subject  originated  from  the  problem 
of  coloring  the  countries  on  a  map  so  that  no  adjacent  countries  have  the  same  color.  The  subject 
of  map  colorings  is  discussed  in  the  Section  6.3,  where  we  consider  planar  graphs. 

If  you  attempt  to  draw  a  graph  on  a  piece  of  paper,  you  will  often  find  that  you  can't  do  it 
unless  you  allow  some  edges  to  cross.  There  are  some  graphs  which  can  be  drawn  in  the  plane  without 
any  edges  crossing;  e.g.,  all  trees.  These  are  called  planar  graphs.  Drawing  a  graph  without  edges 
crossing  is  called  embedding  the  graph.  K^,  the  five  vertex  complete  graph  (all  edges  present)  cannot 
be  embedded  in  the  plane.  Try  it.  Can  you  prove  that  it  can't  be  embedded?  It's  not  clear  how  to 
go  about  proving  that  you  haven't  somehow  missed  a  clever  way  to  embed  it.  The  impossiblity  of 
embedding  K5  is  one  of  the  things  that  we  will  prove  in  Section  6.2. 

Aspects  of  planar  graphs  of  interest  to  us  are  coloring,  testing  for  planarity  and  circuit  design. 
Our  discussion  of  coloring  planar  maps  relies  slightly  on  Section  6.2  and  our  discussion  of  a  planarity 
algorithm  relies  on  Section  6.1. 

The  edges  of  a  graph  can  be  thought  of  as  pipes  that  hold  a  fluid.  This  leads  to  the  idea  of  network 
flows.  We  can  also  interpret  the  edges  as  roads,  telephone  lines,  etc.  The  central  practical  problem  is 
to  maximize,  in  some  sense,  the  flow  through  a  network.  This  has  important  applications  in  industry. 
In  Section  6.4,  we  will  discuss  the  underlying  concepts  and  develop,  with  some  gaps,  an  algorithm 
for  maximizing  flow.  The  theory  of  flows  in  networks  has  close  ties  with  "linear  programming"  and 
"matching  theory."  We  will  explore  the  latter  briefly. 

We  can  ask  what  "typical"  large  graphs  are  like.  For  example, 

•  How  many  leaves  does  a  typical  tree  have? 

•  How  many  "triangles" — three  vertices  all  joined  by  edges — does  a  typical  graph  have? 
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•  How  small  can  we  make  the  function  q{n)  and  still  have  most  n- vertex,  q-edge  simple  graphs 

connected? 

Questions  like  these  are  discussed  in  Section  6.5. 

Finally,  we  introduce  the  subject  of  "finite  state  machines"  in  Section  6.6.  They  provide  a 
theoretical  basis  for  studying  questions  related  to  the  computability  and  complexity  of  functions 
and  algorithms.  Various  classes  of  finite  state  machines  are  closely  associated  with  various  classes 
of  grammars,  an  association  we'll  explore  briefly  in  Section  9.2.  In  Example  11.2  (p.  310),  we'll  see 
how  some  machines  provide  a  method  for  solving  certain  types  of  enumeration  problems. 

The  sections  in  this  chapter  arc  largely  independent  of  each  other.  Other  parts  of  the  book  do 
not  require  the  material  in  this  chapter.  If  you  are  not  familiar  with  Q{  )  and  0{  )  notation,  you 
will  need  to  read  Appendix  B  for  some  of  the  examples  in  this  chapter. 

6.1   Spanning  Trees 


Here's  the  definition  of  what  we'll  be  studying  in  this  section. 

Definition  6.1  Spanning  tree  A  spanning  tree  of  a  (simple)  graph  G  =  {V,E)  is  a 
subgraph  T  =  (V,  E')  which  is  a  tree  and  has  the  same  set  of  vertices  as  G. 

Example  6.1  Since  a  tree  is  connected,  a  graph  with  a  spanning  tree  must  be  connected.  On 
the  other  hand,  you  were  asked  to  prove  in  Exercise  5.4.2  (p.  139)  that  every  connected  graph  has  a 
spanning  tree.  Thus  we  have:  A  graph  is  connected  if  and  only  if  it  has  a  spanning  tree.  It  follows 
that,  if  we  had  an  algorithm  that  was  guaranteed  to  find  a  spanning  tree  whenever  such  a  tree  exists, 
then  this  algorithm  could  be  used  to  decide  if  a  graph  is  connected.  Q 

In  this  section,  we  study  minimum  weight  spanning  trees  and  lineal  spanning  trees. 
Minimum  Weight  Spanning  Trees 


Suppose  we  wish  to  install  "lines"  to  link  various  sites  together.  A  site  may  be  a  computer  installation, 
a  town  or  a  spy.  A  line  may  be  a  digital  communication  channel,  a  rail  line  or  a  contact  arrangement. 
We'll  assume  that 

•  a  line  operates  in  both  directions; 

•  it  must  be  possible  to  get  from  any  site  to  any  other  site  using  lines; 

•  each  possible  line  has  a  cost  (rental  rate,  construction  costs  or  likelihood  of  detection)  indepen- 
dent of  each  other  line's  cost; 

•  we  want  to  choose  lines  to  minimize  the  total  cost. 

We  can  think  of  the  sites  as  vertices  ^  in  a  graph,  the  lines  as  edges  E  and  the  costs  as  a  function 
A  from  the  edges  to  the  real  numbers.  Let  T  =  {V,E')  be  a  subgraph  of  G  =  {V,E).  Define  A(T), 
the  weight  of  T,  to  be  the  sum  of  A(e)  over  all  e  G  E' .  Minimizing  total  cost  means  choosing  T  so 
that  A(T)  is  a  minimum.  Getting  from  one  site  to  another  means  choosing  T  so  that  it  is  connected. 
It  follows  that  we  should  choose  T  to  be  a  spanning  tree — if  T  had  more  edges  than  in  a  spanning 
tree,  we  could  delete  some;  if  T  had  less,  it  would  not  be  connected.  (See  Exercise  5.4.2  (p.  139).) 
We  call  such  a  T  a  minimum  weight  spanning  tree  of  (G,  A) ,  or  simply  of  G,  with  A  understood  from 
context. 
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How  can  we  find  a  minimum  weight  spanning  tree  T?  One  approach  is  to  construct  T  by  adding 
an  edge  at  a  time  in  a  greedy  way.  Since  we  want  to  minimize  the  weight,  "greedy"  means  keeping 
the  weight  of  each  edge  we  add  as  low  as  possible.  Here's  such  an  algorithm. 

Theorem  6.1  Prim's  Algorithm  (Minimum  Weight  Spanning  Tree)  LetG  =  {V,E) 
be  a  simple  graph  with  edge  weights  given  by  A.  If  the  following  algorithm  stops  with  V  ^  V, 
G  has  no  spanning  tree;  otherwise,  (V,  E')  is  a  minimum  weight  spanning  tree  for  G. 

1.  Start:  Let  E'  =  ^  and  let  V  =  {vq]  where  vq  is  any  vertex  in  V. 

2.  Possible  Edges:  Let  F  C  E  be  those  edges  {x,  y}  with  x  e  V  and  y  ^  V.  If  F  =  %,  stop. 

3.  Choose  Edge  Greedily:    Let  f  =  {x,y}  he  such  that  A(/)  is  a  minimum  over  all  f  G  F. 
Replace  V  with  V  U  {y}  and  E'  with  E'  U  {/}.  Go  to  Step  2. 

Proof:  We  begin  with  the  first  part;  i.e,  if  the  algorithm  stops  with  V  V,  then  G  has  no  spanning 
tree.  Suppose  that  V  and  that  there  is  a  spanning  tree.  We  will  prove  that  the  algorithm  does 
not  stop  at  V' .  Choose  u  E  V  —  V  and  v  G  V'.  Since  G  is  connected,  there  must  be  a  path  from  u 
to  V.  Each  vertex  on  the  path  is  either  in  V  or  not.  Since  u  and  v  €  V ,  there  must  be  an  edge 
/  on  the  path  with  one  end  in  V'  and  one  end  not  in  V' .  But  then  f  G  F  and  so  the  algorithm  does 
not  stop  at  v. 

We  now  prove  that,  if  G  has  a  spanning  tree,  then  {V,E')  is  a  minimum  weight  spanning  tree. 
One  way  to  do  this  is  by  induction:  We  will  prove  that  at  each  step  there  is  a  minimum  weight 

spanning  tree  of  G  that  contains  E' . 

The  starting  case  for  the  induction  is  the  first  step  in  the  algorithm;  i.e.,  E'  =  %.  Since  G  has 
a  spanning  tree,  it  must  have  a  minimum  weight  spanning  tree.  The  edges  of  this  tree  obviously 

contain  the  empty  set,  which  is  what  E'  equals  at  the  start. 

We  now  carry  out  the  inductive  step  of  the  proof.  Let  V'  and  E'  be  the  values  going  into  Step 
3  and  let  /  =  {x,y}  be  the  edge  chosen  there.  By  the  induction  hypothesis,  there  is  a  minimum 
weight  spanning  tree  T  of  G  that  contains  the  edges  E' .  If  it  also  contains  the  edge  f.  wc  are  done. 
Suppose  it  does  not  contain  /.  We  will  prove  that  we  can  replace  an  edge  in  the  minimum  weight 
tree  with  /  and  still  achieve  minimum  weight. 

Since  T  contains  all  the  vertices  of  G,  it  contains  x  and  y  and,  also,  some  path  P  from  x  to  y. 
Since  x  E  V'  and  y  ^  V' ,  this  path  must  contain  an  edge  e  =  {u,v}  with  u  €  V  and  v  ^  V .  We 
now  prove  that  removing  e  from  T  and  then  adding  f  to  T  will  still  give  a  minimum  spanning  tree. 

By  the  definition  of  F  in  Step  2,  e  <E  F  and  so,  by  the  definition  of  /,  A(e)  >  A(/).  Thus  the 
weight  of  the  tree  does  not  increase.  If  wc  show  that  the  result  is  still  a  tree,  this  will  complete  the 
proof. 

The  path  P  together  with  the  edge  /  forms  a  cycle  in  G.  Removing  e  from  P  and  adding  /  still 
allows  us  to  reach  every  vertex  in  P  and  so  the  altered  tree  is  still  connected.  It  is  also  still  a  tree 
because  it  contains  no  cycles — adding  /  created  only  one  cycle  and  removing  e  destroyed  it.  This 
completes  the  proof  that  the  algorithm  is  correct.  Q 

This  proof  illustrates  an  important  technique  for  proving  that  algorithms  are  correct: 

Make  an  assertion  about  the  algorithm  and  then  prove  it  inductively. 

In  this  case  the  assertion  was  the  existence  of  a  minimum  weight  spanning  tree  having  certain  edges. 
Induction  on  the  number  of  those  edges  started  easily  and  the  inductive  step  was  not  too  difficult. 

We  could  construct  an  even  greedier  algorithm:  At  each  time  add  the  lowest  weight  edge  that 
does  not  create  a  cycle.  The  intermediate  graphs  {V ,  E')  that  are  built  in  this  way  may  not  be 
connected;  however,  if  (V,  E)  was  connected,  the;  end  result  will  be  a  minimum  weight  spanning  tree. 
We  leave  it  to  you  to  formulate  the  algorithm  carefully  and  prove  that  it  works  in  Exercise  6.1.5.  This 
algorithm,  with  some  tricky,  efficient  handling  of  the  data  structures  is  called  KruskaVs  Algorithm. 
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Example  6.2  A  more  general  spanning  tree  algorithm  The  discussion  so  far  has  centered 
around  choosing  edges  that  wih  be  in  our  minimum  weight  spanning  tree.  We  could  also  choose 
edges  that  will  not  he  in  our  minimum  weight  spanning  tree.  This  can  be  done  by  selecting  a  cycle 
of  edges,  none  of  which  have  been  rejected,  and  then  rejecting  an  edge  for  which  A  is  largest  among 
the  edges  in  the  cycle.  When  no  more  cycles  remain,  the  remaining  edges  form  a  minimum  weight 
spanning  tree.  We  leave  it  to  you  to  prove  this.  (Exercise  6.1.1) 

These  ideas  can  be  combined:  We  have  two  sets  A  and  R  of  edges  that  begin  by  both  being 
empty.  At  each  step,  we  somehow  add  an  edge  to  either  A  or  R  and  claim  that  there  exists  a 
minimum  weight  spanning  tree  that  contains  all  of  the  edges  in  A  and  none  of  the  edges  in  R.  Of 
course,  this  will  be  true  at  the  start.  The  proof  that  it  is  true  in  general  will  use  induction  and  will 
depend  on  the  specific  algorithm  used  for  adding  edges  to  A  and  R. 

What  sort  of  algorithms  can  be  built  using  this  idea?  We  could  of  course  simply  use  the  greedy 
algorithm  for  adding  edges  to  A  all  the  time,  or  wc  could  use  the  greedier  algorithm  for  adding  edges 
to  A  all  the  time,  or  we  could  use  the  cycle  approach  mentioned  at  the  start  of  this  example  to  add 
edges  to  R.  Something  new:  we  could  sometimes  add  edges  to  A  and  sometimes  to  R,  whichever  is 
more  convenient.  This  can  be  useful  if  we  are  finding  the  tree  by  hand.  D 

Example  6.3  How  fast  is  the  algorithm?  We'll  analyze  the  minimum  weight  (or  cost)  span- 
ning tree  algorithm.  Here's  a  brief  description  of  it.  We  let  G  =  (V,  E)  be  the  given  simple  graph. 

1.  Initialize:  Select  a  vertex  v  &  V  and  let  the  tree  T  consist  of  v. 

2.  Select  an  edge:  If  there  are  no  edges  between  Vt  (the  vertices  of  T)  and  V  —  Vr,  stop;  otherwise, 

add  the  one  of  minimum  cost  to  T  and  go  to  Step  2. 

Of  course,  when  we  add  an  edge  to  T,  we  also  add  the  vertex  on  it  that  was  not  already  in  T.  When 
the  algorithm  stops,  Vr  is  the  vertex  set  of  the  component  of  G  containing  v.  If  G  is  connected,  T 
is  a  minimum  cost  spanning  tree. 

Suppose  that  T  currently  has  k  vertices.  How  much  work  is  required  to  add  the  next  edge  to  T 

in  the  worst  case?  If  the  answer  is  tk,  then  the  worst  case  running  time  is  0{ti  H  h  t\v\-i)-  We 

can't  determine  tk  since  we  didn't  specify  enough  details  in  Step  2.  We'll  fill  them  in  now.  For  each 
vertex  u  in  T,  we  look  at  all  edges  containing  it.  If  {u,  x}  is  such  an  edge,  we  check  to  see  ii  x  GVt 
and,  if  not,  we  check  to  see  if  {u,  x}  is  the  least  cost  edge  found  so  far  in  the  execution  of  Step  2. 
Both  these  checks  can  be  performed  in  constant  time;  i.e,  the  time  does  not  depend  on  \Vt\  or  the 
size  of  G.  Since  we  examine  at  most  \E\  edges,  tk  =  0{\E\).  Since  k  ranges  from  1  to  \V\  —  1  for  a 
connected  graph  G,  the  worst  case  running  time  is  0{\V\  \E\). 

Wc  cannot  say  that  the  worst  case  running  time  is  Q{\V\  \E\)  because  we've  said  "at  most"  in 
our  argument. 

Are  there  faster  algorithms  than  this?  Assuming  no  one  has  organized  our  data  according  to 
edge  costs,  wc  must  certainly  look  at  each  edge  cost  at  least  once,  so  any  algorithm  must  have  best 
case  running  time  at  least  0(|iJ|).  Algorithms  with  worst  case  running  times  0(|-E|  Inln  |y|)  and 
e(|yp)  are  known.  If  G  has  a  lot  of  edges,  we  could  have  \E\  =  Q{\V\'^)  and  so  e{\V\'^)  is  ei\E\). 

Does  this  mean  that  the  0(|Fp)  algorithm  is  best  for  a  graph  with  about  |Fp/4  edges,  the 
typical  number  in  a  "random"  graph?  Not  necessarily.  To  illustrate  why  not,  suppose  that  the 
running  time  of  the  first  algorithm  is  close  to  4|£|  Inln  |y|  and  that  of  the  second  is  close  to  3|yp. 
The  first  algorithm  will  be  better  as  long  as 

3|F|^  >  4|i;|lnln|y|  =  |F|Mnln|y|. 

Solving  for  \V\,  wc  obtain  \V\  >  cxp(e^)  =  5  x  10^.  This  means  that  the  9(|T^[^  Inln  |y|)  algo- 
rithm, which  is  slower  when  \V\  is  very  large,  would  actually  be  faster  for  all  practical  values  of 
\V\.  (Remember,  this  is  hypothetical  because  we  assumed  the  values  of  the  constants  in  the  9(-  •  ■) 
expressions.)  See  Example  B.4  (p.  373)  for  further  discussion.  Q 
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there  is  a  path  to 
r  containing  / 


every  path  to 
r  misses  / 


Figure  6.1  An  example  of  the  division  of  vertices  for  the  lineal  spanning  tree  induction.  The  subgraphs 
A  and  B  are  shaded. 


Lineal  Spanning  Trees 


If  we  simply  want  to  find  any  spanning  tree  of  G,  we  can  choose  any  values  for  the  function  A,  and 
use  the  minimal  weight  spanning  tree  algorithm  of  Theorem  6.1.  Put  another  way,  in  Step  3  we  may 
choose  any  edge  /  £  F.  Sometimes  it  is  important  to  restrict  the  choice  of  /  in  some  way  so  that 
the  spanning  tree  will  have  some  special  property  other  than  being  minimal. 

An  important  example  of  such  a  special  property  concerns  certain  rooted  spanning  trees.  To 
define  the  trees  we  arc  interested  in,  we  borrow  some  terminology  from  genealogy. 

Definition  6.2  Lineal  spanning  tree  Let  x  and  y  he  two  vertices  in  a  rooted  tree  with 
root  r.  If  X  is  on  the  path  connecting  r  to  y,  we  say  that  y  is  a  descendant  of  x.  (In  particular, 
all  vertices  are  descendants  of  r.)  If  one  of  u  and  v  is  a  descendant  of  the  other,  we  say  that 
{u,  v}  is  a  lineal  pair.  A  lineal  spanning  tree  or  depth  first  spanning  tree  of  a  connected 
graph  G  =  (V,  E)  is  a  rooted  spanning  tree  of  G  such  that  each  edge  {u,  v}  of  G  is  a  lineal  pair. 

To  see  some  examples  of  a  lineal  spanning  tree,  look  back  at  Figure  5.5  (p.  139).  It  is  the  lineal 
spanning  tree  of  a  graph,  namely  itself.  We  can  add  some  edges  to  this  graph,  for  example  {a,  /} 
and  {b,j}  and  still  have  Figure  5.5  as  a  lineal  spanning  tree.  On  the  other  hand,  if  we  added  the 
edge  {e,j},  the  graph  would  not  have  Figure  5.5  as  a  lineal  spanning  tree. 

How  can  we  find  a  lineal  spanning  tree  of  a  graph?  That  question  may  be  a  bit  premature — we 
don't  even  know  when  such  a  tree  exists.  We'll  prove 

Theorenn  6.2  Lineal  spanning  tree  existence  Every  connected  graph  G  has  a  lineal 
spanning  tree.  In  fact,  given  any  vertex  r  of  G,  there  is  a  lineal  spanning  tree  of  G  with  root  r. 

Proof:  Our  proof  will  be  by  induction  on  the  number  of  vertices  in  G.  The  thciorcini  is  trivially 
true  for  a  graph  with  one  vertex.  Suppose  we  know  that  the  claim  is  true  for  graphs  with  less  than 
n  vertices.  Let  G  =  {V,  E)  have  n  vertices  and  let  /  =  {r,  s}  e  E.  We  will  prove  that  G  has  a  lineal 
spanning  tree  with  root  r. 

Let  S"  C  1/  be  those  vertices  of  G  that  can  be  reached  by  a  path  starting  at  r  and  containing 
the  edge  /.  Note  that  r  ^  S  because  a  path  cannot  contain  repeated  vertices;  however,  s  €  S.  Let 
R  =  V  —  S.lfx€R,  every  path  from  r  to  a;  misses  s,  for  if  not  we  could  simply  go  from  r  to  s  on 
/  and  then  follow  the  path  to  x. 

Let  A  be  the  subgraph  of  G  induced  by  S  and  let  B  be  the  subgraph  of  G  induced  by  R.  (Recall 
that  the  subgraph  induced  by  S  is  the  set  of  all  edges  in  G  whose  end  points  both  lie  in  S.)  We  now 
prove: 

Each  edge  of  G  that  does  not  contain  r  lies  in  cither  A  or  B.  6.1 

Suppose  we  had  an  edge  {u,  v}  with  u  in  A  and  v  r  in  B.  There  is  a  path  from  r  to  it  using  /.  By 
adding  v  to  the  path,  we  conclude  that  v  &  S,  a.  contradiction. 
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We  claim  that  A  is  connected  and  B  is  connected.  Suppose  x  and  y  are  vertices  that  are  both  in 
A  or  both  in  B.  Since  G  is  connected,  there  is  a  path  joining  them  in  G.  If  an  edge  of  G  does  not  lie 
in  A  or  in  B,  then,  by  (6.1),  r  is  one  of  its  vertices.  A  path  starting  in  A  or  B,  leaving  it  and  then 
returning  would  therefore  contain  r  twice,  contradicting  the  definition  of  a  path.  Thus  the  path  lies 
entirely  in  A  or  entirely  in  B. 

Since  neither  R  nor  S  is  empty,  each  of  A  and  B  have  less  vertices  than  G.  Thus,  since  A  and 
B  are  connected,  we  may  apply  the  induction  hypothesis  to  A  and  to  B.  Let  T{A)  be  the  lineal 
spanning  tree  of  A  rooted  at  s  and  let  T{B)  be  the  lineal  spanning  tree  of  B  rooted  at  r. 

Join  T{A)  to  T{B)  by  /  to  produce  a  connected  subgraph  T  of  G  with  vertex  set  V  and  root  r. 
Since  T{A)  and  T{B)  have  no  cycles,  it  follows  that  T  is  a  spanning  tree  of  G. 

To  complete  the  proof,  we  must  show  that  T  is  lineal.  Let  e  =  {u,  v}  €  E.  If  one  of  u  and  v  is  r, 
then  e  is  a  lineal  pair  of  T.  Suppose  that  r  ^  e  =  {u,v}.  By  (6.1),  e  lies  in  A  or  B.  Since  T{A)  and 
T(B)  are  lineal  spanning  trees,  e  is  a  lineal  pair  of  either  T{A)  or  T{B)  and  hence  of  T.  0 

Example  6.4  BicomponentS  of  graphs  Let  G  =  (F,£')  be  a  simple  graph  For  e,/ G  i?  write 
e  ~  /  if  either  e  =  /  or  there  is  a  cycle  of  G  that  contains  both  e  and  /.  We  claim  that  this  is  an 
equivalence  relation.  To  see  what  we're  talking  about,  let's  look  at  an  example. 


The  edges  fall  into  four  equivalence  classes,  which  we've  arbitrarily  called  A,  B,  T  and  X.  Each  edge 
has  the  letter  of  its  equivalence  class  next  to  it.  Notice  that  the  vertices  do  not  fall  into  equivalence 
classes  because  some  of  them  would  have  to  belong  to  more  than  one  equivalence  class. 

Now  we'll  prove  that  we  have  an  equivalence  relation  by  using  Theorem  5.1  (p.  127).  The  reflexive 
and  symmetric  parts  are  easy.  Suppose  that  e  ~  /  ^  g(.  If  e  =  jf,  then  e  ~  so  suppose  that  e  ^  g. 
Let  e  =  {vi,V2}-  Let  C{e,f)  be  the  cycle  containing  e  and  /  and  C{f,g)  the  cycle  containing  / 
and  g.  In  C{e,  /)  there  is  a  path  Pi  from  vi  to  V2  that  docs  not  contain  e.  Let  x  and  y  =^  x  he  the 
first  and  last  vertices  on  Pi  that  lie  on  the  cycle  containing  /  and  g.  We  know  that  there  must  be 
such  points  because  the  edge  /  is  on  Pi.  Let  P2  be  the  path  in  C(e,  /)  from  y  to  x  containing  e.  In 
C{f,g)  there  is  a  path  P3  from  a:  to  y  containing  g.  We  have  shown  that  P2  followed  by  P3  defines 
a  cycle  containing  e  and  g.  Hence  e  ^  g. 

Since  ~  is  an  equivalence  relation  on  the  edges  of  G,  it  partitions  them.  If  the  partition  has 
only  one  block,  then  we  say  that  G  is  a  biconnected  graph.   If  E'  is  a  block  in  the  partition,  the 

subgraph  of  G  induced  by  E'  is  called  a  bicomponent  of  G.  Note  that  the  bicomponents  of  G  are 
not  necessarily  disjoint:  Bicomponents  may  have  vertices  in  common  (but  nefer  edges).  The  picture 
(6.2)  has  four  bicomponents. 

Finding  the  bicomponents  of  a  graph  is  important  when  we  wish  to  decide  if  the  graph  can  be 
drawn  in  the  plane  so  that  no  edges  cross.  We  discuss  this  briefly  at  the  end  of  Section  6.3. 

Biconnectivity  is  closely  related  to  lineal  spanning  trees.  Suppose  T  is  a  lineal  spanning  tree  of 
G  and  that  the  vertices  x  and  y  are  in  the  same  bicomponent  of  G.  Then  either 


6.2 


{x,  y}  is  an  edge  that  is  a  bicomponent  by  itself 


or 


there  is  a  cycle  containing  x  and  y. 


In  either  case,  since  T  is  a  lineal  spanning  tree,  {x,  y}  is  a  lineal  pair.  This  leads  to  an  algorithm  for 
finding  bicomponents:  Suppose  e  =  {x,  y}  is  an  edge  that  is  not  in  T.  If  /  is  an  edge  on  the  path 
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from  X  to  y  in  T,  write  e  ^  /.  As  it  stands,  is  not  an  equivalence  relation;  however,  it  can  be  made 
into  one  by  adding  what  is  needed  to  insure  reflexivity,  symmetry  and  transitivity.  In  the  resulting 
relation  it  turns  out  that,  e  ~  /  if  and  only  if  e  and  /  are  in  the  same  bicomponent.  (This  requires 
proof,  which  we  omit.)  You  might  like  to  experiment  with  this  idea.  Q 

Exercises 


6.1.1.  A  cycle  approach  to  forming  a  minimum  weight  spanning  tree  was  discussed  in  Example  6.2:  Throw 
away  largest  weight  edges  that  do  not  disconnect  the  graph.  Prove  that  it  actually  leads  to  a  minimum 

weight  spanning  tree  as  follows. 

(a)  Let  T  be  a  minimum  weight  spanning  tree  and  let  e  be  the  first  edge  that  is  removed  by  the 
algorithm  but  is  contained  in  T.  Prove  that  T  with  e  deleted  consists  of  two  components,  Ti  and 

T2. 

(b)  Call  any  edge  in  the  original  graph  that  has  one  end  in  Ti  and  one  in  T2  a  connector.  Prove 
that,  if  /  is  a  connector,  then  A(/)  >  A(e). 

(c)  Let  T*  be  the  spanning  tree  produced  by  the  algorithm.  Prove  that,  if  e  is  added  to  T* ,  then 
the  resulting  graph  has  a  cycle  containing  e  and  some  connector  /. 

(d)  Let  /  be  the  edge  in  (c).  Prove  that  A(/)  >  A(e). 

(e)  Let  /  be  the  edge  in  (c).  Prove  that  T  with  e  removed  and  /  added  is  also  a  minimum  weight 
spanning  tree. 

*(f)  Complete  the  proof. 

6.1.2.  Let  G  be  a  connected  simple  graph  and  let  Bi  and  B2  =^  Bi  be  two  bicomponents  of  G.  Prove  that 
Bi  and  B2  have  at  most  one  vertex  in  common. 

6.1.3.  Let  G  be  a  connected  simple  graph.  Let  Q{G)  be  the  set  of  bicomponents  of  G  and  let  P{G)  be  the 
set  of  all  vertices  of  G  that  belong  to  more  than  one  bicomponent;  i.e.,  P{G)  is  the  union  of  the  sets 
H  n  K  over  all  pairs  H  ^  K  with  H,K  £  Q{G).  Define  a  simple  bipartite  graph  B{G)  with  vertex 
set  W  =  P{G)  U  Q{G)  and  {u,  X}  an  edge  if  u  €  P{G)  and  w  e  X  e  Q{G).  (See  Exercise  5.3.2  for  a 
definition  of  bipartite.) 

(a)  Construct  three  examples  of  B{G),  each  containing  at  least  four  vertices. 

(b)  Prove  that  B{G)  is  a  connected  simple  graph. 

(c)  Prove  that  B{G)  is  a  tree. 

Hint.  Prove  that  a  cycle  in  B{G)  would  lead  to  a  cycle  in  G  that  involved  edges  in  different 
bicomponents. 

(d)  Prove  that  P{G)  is  precisely  the  articulation  points  of  G.  (See  Exercise  5.3.3  for  a  definition.) 

6.1.4.  Using  the  proof  of  Theorem  6.1,  prove:  If  A  is  an  injection  from  £  to  R  (the  real  numbers),  then  the 
minimum  weight  spanning  tree  is  unique. 

'6.1.5.  We  will  study  the  greedier  algorithm  that  was  mentioned  in  the  text.  Suppose  the  graph  G  =  (V,  E,  ip) 
has  n  vertices.  From  previous  work  on  trees,  we  know  that  any  spanning  tree  has  n  —  1  edges.  Let 
9\-i92i  •  •  •  J  9n—i  be  the  edges  in  the  order  chosen  by  the  greedier  algorithm.  Let  ei,  62, . . . ,  e^— 1  be 
the  edges  in  any  spanning  tree  of  G,  ordered  so  that  A(ei)  <  A(ei+i).  Our  goal  is  to  prove  that 
Mdi)  ^  H^i)  for  1  <  i  <  n.  It  follows  immediately  from  this  that  the  greedier  algorithm  produces  a 
minimum  weight  spanning  tree. 

(a)  Prove  that  the  vertices  of  G  together  with  any  k  edges  of  G  that  contain  no  cycles  is  a  graph 
with  n  —  k  components  each  of  which  is  a  tree. 

(b)  Let  Gfc  be  the  graph  with  vertices  V  and  edges  51, . . . ,  fffe.  Let  Hj^  be  the  graph  with  vertices  V 
and  edges  ei, . . . ,  e^.  Prove  that  one  of  the  edges  of  -fffe+i  can  be  added  to  G^.  to  give  a  graph 
with  no  cycles. 

Hint.  Prove  that  there  is  some  component  of  -fffe+i  that  contains  vertices  from  more  than  one 
component  of  G^  and  then  find  an  appropriate  edge  in  that  component  of 

(c)  Prove  that  A(g,)  <  A(ei)  for  1  <  i  <  n. 
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6.1.6.  Using  the  result  in  Exercise  6.1.5,  prove  that  whenever  A  is  an  injection  the  minimum  weight  spanning 
tree  is  unique. 


6.1.7.  For  each  of  the  following  graphs: 


A 


(1) 


D 


A 


(2) 


D 


B 


A 


(3) 


D 


(a)  Find  all  spanning  trees. 

(b)  Find  all  spanning  trees  up  to  isomorphism;  that  is,  find  all  distinct  trees  when  the  vertex  labels 
are  removed. 

(c)  Find  all  depth-first  spanning  trees  rooted  at  A. 

(d)  Find  all  depth-first  spanning  trees  rooted  at  B. 

6.1.8.  For  each  of  the  following  graphs: 


A' 

(1)  1 
D 


-JB 
2 


at- 


(2)  2  y 

D< 


A' 

(3)  1 
D 


(a)  Find  all  minimal  spanning  trees. 

(b)  Find  all  minimal  spanning  trees  up  to  isomorphism;  that  is,  find  all  distinct  trees  when  the 
vertex  labels  are  removed. 

(c)  Find  all  minimal  depth-first  spanning  trees  rooted  at  A. 

(d)  Find  all  minimal  depth- first  spanning  trees  rooted  at  B. 


6.1.9.  In  the  following  graph,  the  edges  are  weighted  either  1,  2,  3,  or  4. 

A  1  I 


(a)  Find  a  minimal  spanning  tree  using  the  method  of  Theorem  6.1. 

(b)  Find  a  minimal  spanning  tree  using  the  method  of  Example  6.2. 

(c)  Find  a  minimal  spanning  tree  using  the  method  of  Exercise  6.1.5. 

(d)  Find  a  depth-first  spanning  trees  rooted  at  K. 
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Example  6.5    Register  allocation    Optimizing  compilers  use  a  variety  of  techniques  to  produce 

faster  code.  One  obvious  way  to  produce  faster  code  is  to  keep  variables  in  registers  so  that  memory 
references  are  eliminated.  Unfortunately,  there  are  often  not  enough  registers  available  to  do  this, 
so  choices  must  be  made.  For  simplicity,  assume  that  the  registers  and  variables  are  all  the  same 
size.  Suppose  that,  by  some  process,  we  have  gotten  a  list  of  variables  that  we  would  like  to  keep  in 
registers. 

Can  we  keep  them  in  registers?  If  the  number  of  variables  does  not  exceed  the  number  of 
available  registers,  we  can  obviously  do  it.  This  sufficient  condition  is  not  necessary:  We  may  have 
two  variables  that  are  only  used  in  two  separate  parts  of  the  program.  They  could  share  a  register. 

This  suggests  that  we  can  define  a  binary  relation  among  variables.  We  could  say  that  two 
variables  are  "compatible"  if  they  may  share  a  register.  Alternatively,  we  could  say  that  two  variables 
"conflict"  if  they  cannot  share  a  register.  Two  variables  are  either  compatible  or  in  conflict,  but  not 
both.  Thus  we  can  derive  one  relation  from  the  other  and  it  is  rather  arbitrary  which  we  focus  on. 
For  our  purposes,  the  conflict  relation  is  better. 

Construct  a  simple  graph  whose  vertices  are  the  variables.  Two  variables  are  joined  by  an  edge 
if  and  only  if  they  conflict.  A  register  assignment  can  be  found  if  and  only  if  we  can  find  a  function 
A  from  the  set  of  vertices  to  the  set  of  registers  such  that  whenever  {Vjw}  is  an  edge  X{v)  ^  X{w). 
(This  just  says  that  if  v  and  w  conflict  they  must  have  difii'erent  registers  assigned  to  them.)  This 
section  studies  such  "  "vertex  labelings"  A.  Q 

Definition  6.3  Graph  coloring  Let  G  =  {V,  E)  he  a  simple  graph  and  C  a  set.  A  proper 
coloring  of  G  using  the  "colors"  in  C  is  a  function  A:  F  — >  C  such  that  X{v)  ^  X{w)  whenever 
{v,w}  e  E. 

Some  people  omit  "proper"  and  simply  refer  to  a  "coloring."  A  solution  to  the  register  allocation 
problem  is  a  proper  coloring  of  the  "conflict  graph"  of  the  variables  using  the  set  of  registers  as 
colors. 

Given  G  and  C,  we  can  ask  for  reasonably  fast  algorithms  to  answer  various  questions  about 
proper  colorings: 

1.  Does  there  exist  a  proper  coloring  of  G  with  C? 

2.  What  is  a  good  way  to  find  a  proper  coloring  of  G  with  C,  if  one  exists? 

3.  How  many  proper  colorings  of  G  with  C  are  there? 

Question  1  could  be  answered  by  an  algorithm  that  attempts  to  construct  a  proper  coloring  and 
fails  only  if  none  exist.  It  could  also  be  answered  by  calculating  the  number  of  proper  colorings  and 
discovering  that  this  number  is  zero.  Before  trying  to  answer  these  questions,  let's  look  at  more 
examples. 
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Example  6.6  Scheduling  problems  Register  alloeation  is  an  example  of  a  simple  scheduling 
problem.  In  this  terminology,  variables  are  scheduled  for  storage  in  registers.  A  scheduling  problem 
can  involve  a  variety  of  constraints.  Some  of  these  can  be  sequential  in  nature:  Problem  definition 
must  occur  before  algorithm  formulation,  which  must  in  turn  occur  before  division  of  programming 
tasks.  Others  can  be  conflict  avoidance  like  register  allocation.  The  simplest  sort  of  conflict  avoidance 
conditions  are  of  the  form  "t;  and  iv  cannot  be  scheduled  together,"  which  we  encountered  with 
register  allocation.  These  can  be  phrased  as  graph  coloring  problems. 

Here's  an  example.  Suppose  we  want  to  make  up  a  course  schedule  that  avoids  obvious  conflicts. 
We  could  let  the  vertices  of  our  graph  be  the  courses.  Two  courses  are  connected  by  an  edge  if  we 
expect  a  student  in  one  course  will  want  to  enroll  in  the  other  during  the  same  term.  The  colors  are 
the  times  at  which  courses  meet.  D 

Example  6.7    Map  coloring    In  loose  terms,  a  map  is  a  collection  of  regions,  called  countries 

and  water,  that  partition  a  sphere.  (A  sphere  is  the  surface  of  a  ball.)  To  make  it  easy  to  distinguish 
regions  that  have  a  common  boundary,  they  should  be  different  colors.  ("Common  boundary"  means 
an  actual  line  segment  or  curve  and  not  just  a  single  point.)  This  can  be  formulated  as  a  graph 
coloring  problem  by  letting  the  regions  be  vertices  and  by  joining  two  vertices  with  an  edge  if  the 
corresponding  regions  have  a  common  boundary.  What  problems,  if  any,  are  caused  by  our  loose 
definition  of  a  map?  A  country  may  consist  of  several  pieces,  like  the  United  States  which  includes 
Alaska  and  Hawaii.  This  is  ruled  out  in  a  careful  definition  of  a  map. 

It  is  easy  to  find  a  map  that  requires  four  colors.  Try  to  find  such  a  map  yourself.  Later  we  will 
prove  that  any  map  can  be  colored  with  five  colors.  How  many  colors  are  needed?  From  the  past  few 
sentences,  at  least  four  and  at  most  five.  Four  colors  suffice.  This  fact  is  known  as  the  Four  Color 
Theorem.  At  present,  the  only  way  this  can  be  proved  is  by  extensive  computer  calculations.  This 
was  done  by  Appel  and  Haken  in  1976. 

Maps  can  be  defined  on  more  complicated  surfaces  (like  a  torus — the  surface  of  a  doughnut). 
For  each  such  surface  S  there  is  a  number  n{S)  such  that  some  map  requires  n{S)  colors  and  no 
map  requires  more.  A  fairly  simple  formula  has  been  found  and  proved  for  n{S).  It  is  somewhat 
amusing  that  computer  calculations  are  needed  only  in  what  appears  to  be  the  simplest  case — when 
S  is  equivalent  to  a  sphere.  □ 

How  can  we  construct  a  proper  coloring  of  a  graph?  Suppose  we  have  n  vertices  and  c  colors.  We 
could  systematically  try  all  c"  possible  assignments  of  colors  to  vertices  until  we  find  one  that  works 
or  we  find  that  there  is  no  proper  coloring.  Backtracking  on  a  decision  tree  can  save  considerable 
time.  A  decision  corresponds  to  assigning  a  color  to  a  vertex.  Suppose  that  we  are  at  some  node  t  in 
the  decision  tree  where  we  have  already  colored  vertices  vi, . . .  ,Vk  of  the  graph.  The  edges  leading 
out  of  t  correspond  to  the  different  ways  of  coloring  Vk+i  so  that  it  is  not  the  same  color  as  any 
oi  Vi, . . .  ,Vk  that  are  adjacent  to  it.  It  is  not  clear  how  fast  this  algorithm  is  or  if  we  could  find  a 
substantially  better  one. 

We'll  abandon  the  construction  problem  in  favor  of  the  counting  problem.  We  will  prove 

Theorem  6.3  Chromatic  polynomial  existence  Let  G  be  a  simple  graph  with  n  vertices. 
There  is  a  polynomial  Pq  {x)  of  degree  n  such  that  for  each  positive  integer  m,  the  number  of 
ways  to  properly  color  G  using  m  is  Pcim). 

Pg{x)  is  called  the  chromatic  polynomial  of  G.  Various  properties  of  it  are  found  in  the  exercises. 

We'll  give  two  proofs  of  the  theorem.  The  first  is  simple  but  it  does  not  provide  a  useful  method 
for  determining  Pq{x).  The  second  is  more  complicated,  but  the  steps  of  the  proof  provide  a  recursive 
method  for  calculating  chromatic  polynomials.  We'll  explore  the  method  after  the  proof. 
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G-e 


Figure  6.2    Forming  G  —  e  and  Ge  from  G  by  deletion  and  contraction. 


Proof:  (Nonconstructive)  Let  daik)  be  the  number  of  ways  to  properly  color  the  graph  using 
k  colors  such  that  each  color  is  used  at  least  once.  Clearly  dc{k)  =  0  when  k  >  n,  the  number  of 
vertices  of  G.  If  we  are  given  x  colors,  then  the  number  of  ways  we  can  use  exactly  k  of  them  to 
properly  color  G  is  {^)dG{k).  (Choose  k  colors  AND  use  them.)  If  dcijt)  7^  0,  this  is  a  polynomial 
in  X  of  degree  k  because  (^)  =  x{x  —  1)  ■  ■  ■  {x  —  k  +  1) /k\,  a  polynomial  in  x  of  degree  k.  Since  the 
number  of  colors  actually  used  is  between  1  and  n, 

=  ili^^daik),  6.3 

a  sum  of  a  finite  number  of  polynomials  in  x.  Note  that 

•  dcin)  ^  0,  since  it  is  always  possible  to  color  a  graph  with  each  vertex  colored  differently  and 

•  the  k  =  n  term  in  the  sum  (6.3)  is  the  only  term  in  the  sum  that  has  degree  n. 

Thus,  Pg{x)  is  a  polynomial  in  x  of  degree  n.  (We  needed  to  know  that  there  was  only  one  term  of 
degree  n  because  otherwise  there  might  be  cancellation,  giving  a  polynomial  of  lower  degree.)  D 


Proof:  (Constructive)  Let  G  =  {V,E).  We'll  use  induction  on  the  number  of  edges  in  the 
graph.  If  the  graph  has  no  edges,  then  Pg{x)  =  x"  because  any  function  from  vertices  to  colors  is 
acceptable  as  a  coloring  in  this  case. 

You  may  find  Figure  6.2  helpful.  Suppose  e  =  {u,  v}  e  E.  Let  G  —  e=  (VjE  —  e),  a  subgraph  of 
G.  This  is  called  deleting  the  edge  e.  Every  proper  coloring  of  G  is  a  proper  coloring  of  G  —  e,  but 
not  conversely — a  proper  coloring  A  of  G  —  e  is  a  proper  coloring  of  G  if  and  only  if  \{u)  /  A(?,'). 
A  proper  coloring  A  of  G  —  e  with  X{u)  =  X{v)  can  be  thought  of  as  a  proper  coloring  of  a  graph 
Ge  in  which  u  and  v  have  been  identified.  This  is  called  contracting  the  edge  e.  Let's  define  Ge 
precisely.  Choose  (  ^  V,  let  Ve  =  V  U  {C}  —  {u,  v}  and  let  E^  be  all  two  element  subsets  of  Ve  in  E 
together  with  all  sets  {(,  y}  for  which  either  {u,  y}  €  E  01  {v,  y}  €  E  01  both.  The  proper  colorings 
of  Ge  =  {Ve,Ef.)  are  in  one-to-one  correspondence  with  the  proper  colorings  A  of  G  —  e  for  which 
A(u)  =  X{v)  —  we  simply  have  A(C)  ~  A('u)  =  A(u).  Thus  every  proper  coloring  of  G  —  e  is  a  proper 
coloring  of  either  G  or  Ge,  but  not  both.  By  the  Rule  of  Sum,  Pc-eix)  =  Pg{x)  +  Pg^{x)  and  so 

Pg{x)  =  PG-e{x)  -  PgAx)-  6.4 

Since  G  —  e  and  Ge  have  less  edges  than  G,  it  follows  by  induction  that  PG-e{x)  is  a  polynomial  of 
degree  |V|  =  n  and  that  PgS^)  is  a  polynomial  of  degree  \Ve\  =  n  —  1.  Thus  (6.4)  is  a  polynomial 
of  degree  n.  Q 
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Example  6.8  Some  chromatic  polynomial  calculations  What  is  the  chromatic  polynomial 
of  the  graph  with  all  (2)  possible  edges  present?  In  this  case  each  vertex  is  connected  to  every  other 
vertex  by  an  edge  so  each  vertex  has  a  different  color.  Thus  we  get  x{x  —  l){x  —  2)  ■  ■  ■  {x  —  n  +  1). 
The  graph  with  all  edges  present  is  called  the  complete  graph  and  is  denoted  by  Kn. 

What  is  the  chromatic  polynomial  of  the  n  vertex  graph  with  no  edges?  Wc  can  color  each 
vertex  any  way  we  choose,  so  the  answer  is  x".  By  using  the  first  proof  of  the  theorem,  we  will 
obtain  another  formula  for  this  chromatic  polynomial.  The  graph  can  be  colored  using  k  colors 
(with  each  color  used)  by  first  partitioning  the  vertices  into  k  blocks  and  then  assigning  a  color  to 
each  block.  By  the  Rule  of  Product,  daik)  =  S{n,k)kl,  where  S{n,k)  is  a  Stirling  number  of  the 
second  kind,  introduced  in  Example  1.27.  By  the  first  proof  and  the  fact  that  Pg{x)  =  x",  we  have 

"    /r\  " 
=  ^        ^(n,/;;)^!  =  -  1)  •  •  •  (a;  -  fc  +  1)  ^(n.  A:).  6.5 

fe=i  ^'^^  fc=i 

What  is  the  chromatic  polynomial  of  a  path  containing  n  vertices?  Let  the  vertices  be  n  and 
the  edges  be  {i,  i  +  1}  for  1  <  i  <  n.  Color  vertex  1  using  any  of  x  colors.  If  the  first  i  vertices  have 
been  colored,  color  vertex  i  +  1  using  any  of  the  x  —  1  colors  different  from  the  color  used  on  vertex 
i.  Thus  we  sec  that  the  chromatic  polynomial  of  the  n  vertex  path  is  x{x  —  l)""-"^. 

We  now  consider  a  more  complicated  problem.  What  is  the  chromatic  polynomial  of  the  graph 
that  consists  of  just  one  cycle?  Let  n  be  the  length  of  the  cycle  and  let  the  answer  be  C„(a;).  Call 
the  graph  Z„.  Since  Z2  is  just  two  connected  vertices,  C2{x)  =  x{x  —  1).  It  is  easy  to  calculate 
C3(x):  color  one  vertex  arbitrarily,  color  the  next  vertex  in  a  different  color  and  color  the  last  vertex 
different  from  the  first  two.  Thus  C3{x)  =  x{x  —  —  2).  What  is  C4{x)7  We  use  (6.4).  If  we  delete 
an  edge  e  from  Z4,  wc  obtain  a  path  on  4  vertices,  which  we  have  dealt  with.  If  we  contract  e,  we 
obtain  Z3,  which  we  have  dealt  with.  Thus 

Ci{x)  =  x{x-l  f  -x{x-l){x-2). 

What  is  the  general  formula?  The  previous  argument  generalizes  to 

Cn{x)  =  x{x  -  ly-^  ~  Cn-i{x)    forn>2.  6.6 

How  can  we  solve  this  recursion?  This  is  not  so  clear.  It  is  easier  to  see  if  we  repeatedly  use  Figure  6.2 
to  expand  things  out  until  we  obtain  paths.  The  result  for  Z5  is  shown  in  the  left  side  of  Figure  6.3. 
In  the  right  side  of  Figure  6.3,  we  have  replaced  each  leaf  by  its  chromatic  polynomial.  We  can  now 
work  upwards  from  the  leaves  to  compute  the  chromatic  polynomial  of  the  root. 

This  is  a  good  way  to  do  it  for  a  graph  that  has  no  nice  structure.  In  this  case,  however,  we 
can  write  down  the  chromatic  polynomial  of  the  root  directly.  Notice  that  the  fcth  leaf  from  the  left 
(counting  the  leftmost  as  0)  has  chromatic  polynomial  x{x  —  l)"^''^^.  If  k  is  even  it  appears  in  the 
chromatic  polynomial  of  the  root  with  a  plus  sign,  while  if  k  is  odd  it  appears  with  a  minus  sign. 
This  shows  that,  for  n  >  1, 

n-2  n-2  ^ 

Cn{x)  =  Y.i-ifxix-ir-"-'  =  x{x-ir-'j2(T^ 

k=0  fe=0  ^ 

which  is  a  geometric  series.  Thus 

(\  ra  — 1 

WW   =  xyx-.j       —  1  . 


You  should  show  that  this  simplifies  to 

Cn{x)  =  (a;  -  1)"  +  (-l)"(a;  -  1)  6.7 

for  n  >  1.  □ 
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Figure  6.3   The  calculation  for  C5(x)  expanded.  Pn  is  the  n  vertex  path. 


Exercises 


6.2.1.  Give  an  alternate  proof  of  (6.7)  by  induction  as  follows:  Prove  by  substitution  that  (6.7)  satisfies 
(6.6)  and  show  that  (6.7)  is  correct  for  n  =  2. 

6.2.2.  Conjecture  and  prove  a  formula  for  the  chromatic  polynomial  of  a  tree.  Your  formula  may  include  the 
number  of  vertices,  the  degrees  of  the  vertices  and  anything  else  that  you  need.  Be  sure  to  indicate 
how  you  arrived  at  your  conjecture.  This  formula  can  be  useful  in  computing  chromatic  polynomials 

by  the  recursive  method. 

Hint.  There  is  a  simple  formula. 

6.2.3.  The  results  in  this  exercise  make  it  easier  to  calculate  some  chromatic  polynomials  using  the  recursive 
method. 

(a)  Suppose  that  G  consists  of  two  graphs  H  and  K  with  no  vertices  in  common.  Prove  that 


(c)  Suppose  that  G  is  formed  by  taking  two  graphs  H  and  K  with  no  vertices  in  common,  choosing 
vertices  v  €  H  and  w  &  K,  and  adding  the  edge  {v,  w}.  Express  Pcix)  in  terms  of  Ph{x)  and 


6.2.4.  Let  n  and  k  be  integers  such  that  1  <  fc  <  n  —  1.  Lot  G  have  V  =  n  and  edges  {i,  i  + 1}  for  1  <  i  <  n, 
{l,n},  and  {k,n}.  Thus  G  is  Zn  with  one  additional  edge.  Obtain  a  formula  for  Pq{x). 

6.2.5.  Let  Ln  be  the  simple  graph  with  V  =  nx  2  and  {(i,i),  (i',i')}  edge  if  and  only  if  i  =  i'  or 
j  =  f  and  |i  —  i'l  =  1.  The  graph  looks  somewhat  like  a  ladder  and  has  3n  —  2  edges.  Prove  that  the 
chromatic  polynomial  of  L„  is  {x^  —3x  +  3)"'~^x{x  —  1). 

6.2.6.  Let  G  be  the  simple  graph  with  V  =  3x3  and  {(i,j),  (i',j')}  an  edge  if  and  only  if  +  =  1. 
It  looks  like  a  2  x  2  board.  Compute  its  chromatic  polynomial. 

Hint.  It  appears  that  any  way  it  is  done  requires  some  computation.  Using  the  edges  joined  to  (2,  2) 

for  deletion  and  contraction  is  helpful. 

*6.2.7.  Construct  a  graph  from  the  cube  by  removing  the  interior  and  the  faces  and  leaving  just  the  vertices 
and  edges.  What  is  the  chromatic  polynomial  of  this  graph?  (There  is  not  some  quick  neat  way  to 
do  this.  Quite  a  bit  of  careful  calculation  is  involved.  A  bit  of  care  in  selecting  edges  for  removal  and 
contraction  will  help.) 

6.2.8.  Give  a  proof  of  (6.5)  by  counting  all  functions  from  n  to  x  in  two  ways. 

6.2.9.  Adapt  the  second  proof  that  Pcix)  is  a  polynomial  of  degree  n  to  prove  that  the  coefficients 
of  the  polynomial  alternate  in  sign;  that  is,  the  coefficient  of  x'^  is  a  nonnegative  multiple  of 


(b) 


Pk{x). 


(-1) 


n—k 
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Planar  Graphs 


Recall  that,  drawing  a  graph  in  the  plane  without  edges  crossing  is  called  embedding  the  graph  in 

the  plane.  Any  graph  that  can  be  embedded  in  the  plane  can  be  embedded  in  the  sphere  (i.e.,  the 
surface  of  a  ball)  and  vice  versa.  The  idea  is  simple:  Cut  a  little  hole  out  of  the  sphere  in  such  a 
way  that  you  don't  remove  any  of  the  graph,  then,  pretending  the  sphere  is  a  rubber  sheet,  stretch 
it  flat  to  form  a  disc.  Conversely,  any  map  on  the  plane  is  bounded,  so  we  can  cut  a  disc  containing 
a  map  out  of  the  plane  and  curve  it  around  to  fit  on  a  sphere.  Thus,  studying  maps  on  the  plane  is 
equivalent  to  studying  maps  on  the  sphere. 

Sometimes  fairly  simple  concepts  in  mathematics  lead  to  a  considerable  body  of  research.  The 
research  related  to  planar  graphs  is  among  the  most  accessible  such  bodies  for  someone  without 
extensive  mathematical  training.  Here  are  some  of  the  research  highlights  and  what  we'll  be  doing 
about  them. 

1.  The  earliest  is  probably  Euler's  relation,  which  we'll  discuss  soon.  If  the  sphere  is  cut  along  the 
edges  of  an  embedded  connected  graph,  we  obtain  pieces  called  faces.  Euler  discovered  that  the 
number  of  vertices  and  faces  together  difii'ered  from  the  number  of  edges  by  a  constant.  This  has 
been  extended  to  graphs  embedded  in  other  surfaces  and  to  generalizations  of  graphs  in  higher 
dimensions.  The  result  is  an  important  number  associated  with  a  generalized  surface  called  its 
Euler  characteristic. 

2.  The  four  color  problem  has  already  been  mentioned  in  the  section  on  chromatic  polynomials.  As 
noted  there,  it  has  been  generalized  to  other  surfaces.  We'll  use  Euler's  relation  to  prove  that 
five  colors  suffice  on  the  plane. 

3.  A  description  of  those  graphs  which  can  be  drawn  in  the  plane  was  obtained  some  time  ago  by 
Kuratowski:    A  graph  is  planar  if  and  only  if  it  does  not  "contain"  either 

•  i^s,  the  five  vertex  complete  graph,  or 

•  -R's.a,  the  graph  with  V  =  {ai,  02,  03,  61,  62,  ^3}  and  all  nine  edges  of  the  form  {a^,  bj}. 

We  say  that  G  contains  H  if,  except  for  labels  we  can  obtain  H  from  G  by  repeated  use  of  the 
three  operations: 

(a)  delete  an  edge, 

(b)  delete  a  vertex  that  lies  on  no  edges  and 

(c)  if  V  lies  only  on  the  edges  ei  =  {v,ai}  and  62  =  {v,a2},  delete  v,  e\  and  62  and  add  the 
edge  {ai,a2}. 

Research  has  developed  in  two  directions.  One  is  algorithmic:  Find  good  algorithms  for  deciding 
if  a  graph  is  planar  and,  if  so,  for  embedding  it  in  the  plane.  We'll  discuss  the  algorithmic 
approach  a  bit.  The  other  direction  is  more  theoretical:  Develop  criteria  like  Kuratowski's  for 
other  surfaces.  It  has  recently  been  proved  that  such  criteria  exist:  There  is  always  a  finite  list 
of  graphs,  like  and  K^^^,  that  are  bad.  We  will  not  be  able  to  pursue  this  here;  in  fact,  we 
will  not  even  prove  Kuratowski's  Theorem. 

4.  Let's  allow  loops  in  our  graphs.  A  graph  embedded  in  the  plane  has  a  dual.  A  dual  is  constructed 
as  follows  for  an  embedded  graph  G.  Place  a  vertex  of  D{G)  in  each  face  of  G.  Thus  there  is  a 

bijcction  between  faces  of  G  and  vertices  of  D{G).  Every  edge  e  of  G  is  crossed  by  exactly  one 
edge  e'  of  D{G),  and  vice  versa,  as  follows.  Let  the  face  on  one  side  of  e  be  /i  and  the  face  on 
the  other  side  be  /2.  (Possibly,  /i  =  /2.)  If  v[  and  v'2  are  the  corresponding  vertices  in  D{G), 
then  e'  connects  v[  and  Attempting  to  extend  the  idea  of  a  dual  to  other  graphs  leads  to 
what  are  called  "matroids"  or  "combinatorial  geometries."  We  won't  discuss  this  subject  at  all. 

The  algorithmic  subsection  does  not  require  the  earlier  material,  but  it  does  require  some  knowl- 
edge of  spanning  trees,  which  were  discussed  in  Section  6.1. 
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Our  terminology  "G  contains  _ff"  in  Item  3  is  not  standard.  People  are  likely  to  say  "G  contains 
H  homeomorphically."  You  should  note  that  this  is  not  the  same  as  H  being  a  subgraph  of  G. 
Repeated  application  of  Rules  (a)  and  (b)  gives  a  subgraph  of  G.  Conversely,  all  subgraphs  of  G  can 
be  obtained  this  way.  Rule  (c)  allows  us  to  contract  an  edge  if  it  has  an  cndpoint  of  degree  2.  The 
result  is  not  a  subgraph  of  G.  For  example,  applying  (c)  to  a  cycle  of  length  4  produces  a  cycle  of 
length  3. 

Euler's  Relation 


We'll  state  and  prove  Euler's  relation  and  then  examine  some  of  its  consequences. 

Theorem  6.4  Euler's  relation  Let  G  =  {V,E,(p)  he  a  connected  graph.  Suppose  that  G 
has  been  embedded  in  the  plane  (or  sphere)  and  that  the  embedding  has  f  faces.  Then 

\V\  -\E\+f  =  2.  6.8 

This  remains  true  if  we  extend  the  notion  of  a  graph  to  allow  loops. 

Proof:  In  Exercise  5.4.3  (p.  140)  you  were  asked  to  prove  that  v  =  e  +  1  ioi  trees.  Since  cutting 
along  the  edges  of  a  tree  does  not  make  the  plane  (or  sphere)  fall  apart,  /  =  1.  Thus  Euler's  relation 
holds  for  all  trees. 

Is  there  some  way  wc  can  prove  the  general  result  by  using  the  fact  that  it  is  true  for  trees?  This 
suggests  induction,  but  on  what  should  we  induct?  By  Exercise  5.4.2  (p.  139),  a  tree  has  the  least 
number  of  edges  of  any  connected  ?;-vertex  graph.  Thus  we  should  somehow  induct  on  the  number  of 
edges.  With  care,  wc  could  do  this  without  reference  to  u,  but  there  are  better  ways.  One  possibility 
is  to  induct  on  d  =  e  —  v,  where  d  >  —1.  Trees  correspond  to  the  case  d  =  —1  and  so  start  the 
induction. 

Another  approach  to  the  induction  is  to  consider  v  to  be  fixed  but  arbitrary  and  induct  on  e. 
Prom  this  viewpoint,  our  induction  starts  at  e  =  t;  —  1,  which  is  the  case  of  trees. 
The  two  approaches  are  essentially  the  same.  We'll  take  the  latter  approach. 

Lot  G  =  {V,  E,  (f)  be  any  connected  graph  embedded  in  the  plane.  From  now  on,  we  will  work 
in  the  plane,  removing  edges  from  this  particular  embedding  of  G.  By  Exorcise  5.4.2,  G  contains  a 
spanning  tree  T  —  {V,E'),  say.  Let  x  ^  E  —  E';  that  is,  some  edge  of  G  not  T.  The  subgraph  G' 
induced  hy  E  —  {x}  is  still  connected  since  it  contains  T. 

Let  e'  =  e  —  1  and  /'  be  the  number  of  edges  and  faces  of  G'.  By  the  induction  assumption,  G' 
satisfies  (6.8)  and  so  v  —  e'  +  f  =  2.  We  will  soon  prove  that  the  opposite  sides  of  x  are  in  different 
faces  of  G.  Thus,  removing  x  merges  these  two  faces  and  so  /'  =  /  —  1.  This  completes  the  inductive 
step. 

We  now  prove  our  claim  that  opposite  sides  of  x  lie  in  different  faces.  Suppose  ^{x)  =  {a,  b}.  Let 
P  be  a  path  from  a  to  6  in  T.  Adding  x  to  the  path  produces  a  cycle  C.  Any  face  of  the  embedding 
must  lie  entirely  on  one  side  of  G.  Since  one  side  of  x  is  inside  G  and  the  other  side  of  x  is  outside 
G,  the  two  sides  of  x  must  lie  in  different  faces.  Q 

One  interesting  consequence  of  Euler's  relation  is  that  it  tells  us  that  no  matter  how  wc  embed 
a  graph  G  =  {V,E)  in  the  plane — and  there  are  often  many  ways  to  do  so — it  will  always  have 
l-^l  ~  1^1  +  2  faces.  We'll  derive  more  interesting  results. 

Corollary  1  If  G  is  a  planar  connected  simple  graph  with  e  edges  and  v  >  2  vertices,  then 
e  <  3f  —  6. 
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Proof:  When  we  trace  aroimd  the  boundary  of  a  face  of  G,  we  encounter  a  sequence  of  vertices 
and  edges,  finally  returning  to  our  starting  position.  Call  the  sequence  vi,  ei,  V2,  62, ... ,  Vd,  e^,  vi.  We 
call  d  the  degree  of  the  face.  Some  edges  may  be  encountered  twice  because  both  sides  of  them  are 
on  the  same  face.  A  tree  is  an  extreme  example  of  this:  Every  edge  is  encountered  twice.  Thus  the 
degree  of  the  face  of  a  tree  with  e  edges  is  2e.  Let  fk  be  the  number  of  faces  of  degree  k.  Since  there 
are  no  loops  or  multiple  edges,  /i  =  /2  =  0.  If  we  trace  around  all  faces,  we  encounter  each  edge 
exactly  twice.  Thus 

2e  =  ^fc/fc  >  ^3/fe  =  3/ 

fc>3  fc>3 

and  so  /  <  2e/3.  Consequently,  2  =  v  —  e  +  f<v  —  e  +  2e/3.  Rearrangement  gives  the  corollary.  Q 

Example  6.9    Nonplanarity  of  The  graph        has  5  vertices  and  10  edges  and  so 

3v  —  6  =  9  <  e.  Thus,  it  cannot  be  embedded  in  the  plane.  Q 

The  following  result  will  be  useful  in  discussing  coloring. 

Corollary  2  IfG  is  a  planar  connected  simple  graph,  then  at  least  one  vertex  of  G  has  degree 
less  than  6. 

Proof:  Wc  suppose  that  the  conclusion  of  the  corollary  is  false  and  derive  a  contradiction.  Let  Vk 
be  the  number  of  vertices  of  degree  k.  Since  each  edge  contains  two  vertices,  2e  =  Yli^^k-  Since  no 
vertex  has  degree  less  than  6,  w/j  =  0  for  A;  <  6  and  so 

2e  =        kvk  >  Qv. 

Thus  e  >  3v,  contradicting  the  previous  corollary  when  v  >  3.  U  v  <  3,  there  are  at  most  3  edges, 
so  the  result  is  trivial.  Q 


Exercises 

6.3.1.  Suppose  that  G  is  a  planar  connected  graph  with  e  edges  and  v  >  2  vertices  and  that  it  contains  no 
cycles  of  length  3.  Prove  that  e  >  2v  —  4. 

6.3.2.  Prove  that  ^^3,3  is  not  a  planar  graph.  (^^3,3  was  defined  on  page  162.) 

6.3.3.  We  will  call  a  connected  graph  embedded  in  the  sphere  a  regular  graph  if  all  vertices  have  the  same 
degree,  say  dv  and  all  faces  have  the  same  degree,  say  df. 

(a)  If  e  is  the  number  of  edges  of  the  graph,  prove  that 
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(b)  The  possible  graphs  with  dv  =  2  are  simple  to  describe.  Do  it. 

(c)  By  the  previous  part,  we  may  as  well  assume  that  dv  >  3.  Since  both  sides  of  (6.9)  are  positive, 
conclude  that  one  of  dv  and  dj  must  be  3  and  that  the  other  is  a  most  5. 

(d)  Draw  all  embedded  regular  graphs  with  dv  >  2. 
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6.3.4.  Chemists  have  discovered  a  eompound  whose  molecule  is  shaped  like  a  hollow  ball  and  consisting 
of  sixty  carbon  atoms.  It  is  called  buckmmsterfuUerene.  (This  is  not  a  joke.)  There  is  speculation 
that  this  and  similar  carbon  molecules  may  be  common  in  outer  space.  A  chemical  compound  can 
be  thought  of  as  a  graph  with  the  atoms  as  vertices  and  the  chemical  bonds  as  edges.  Thus  buck- 
minsterfullerene  can  be  viewed  as  a  graph  on  the  sphere.  Because  of  the  properties  of  carbon,  it  is 
reasonable  to  suppose  that  each  atom  is  bound  to  exactly  3  others  and  that  the  faces  of  the  associ- 
ated embedded  graph  are  either  hexagons  or  pentagons.  How  many  hexagons  are  there?  How  many 
pentagons  are  there? 

Hint.  One  method  is  to  determine  the  number  of  edges  and  then  obtain  two  relations  involving  the 
number  of  pentagons  and  number  of  hexagons,  one  by  counting  edges  and  another  from  Euler's 
relation. 

*6.3.5.  A  graph  can  be  embedded  on  other  finite  surfaces  besides  the  sphere.  In  this  case,  there  is  usually 
another  condition:  If  we  cut  along  all  the  edges  of  the  graph,  we  get  faces  of  the  embedded  graph 
and  they  all  look  like  stretched  polygons.  This  is  called  properly  embedding  the  graph.  To  see  that 
all  embeddings  are  not  proper,  consider  a  torus  (surface  of  a  donut).  A  3-cyclc  can  be  embedded 
around  the  torus  like  a  bracelet.  When  we  cut  along  the  edges  and  straighten  out  the  result,  we  have 
a  cylinder,  not  a  stretched  polygon. 

For  proper  embeddings  in  any  surface,  there  is  a  relation  like  Euler's  relation  (6.8): 
|\^|  —      +  /  =  c,  but  the  value  of  c  depends  on  the  surface.  For  the  sphere,  it  is  2. 

(a)  Properly  embed  some  graph  on  the  torus  and  compute  c  for  it. 

(b)  Prove  that  your  value  in  (a)  is  the  same  for  all  proper  embeddings  of  graphs  on  the  torus. 

Hint.  Cut  around  the  torus  like  a  bracelet,  avoiding  all  vertices.  Fill  in  each  of  the  holes  with  a 
circle,  introducing  edges  and  vertices  along  the  cuts. 

The  Five  Color  Theorem 


Our  goal  is  to  prove  the  five  color  theorem: 

Theorem  6.5  Heawood's  Theorem  Every  planar  graph  G  =  {V,E)  can  be  properly 
colored  with  five  colors  (i.e.,  adjacent  vertices  have  distinct  colors). 

Although  four  colors  are  enough,  we  will  not  prove  that  since  the  only  known  method  is  quite 
technical  and  requires  considerable  computing  resources.  On  the  other  hand,  if  we  were  satisfied 
with  six  colors,  the  proof  would  much  easier.  We'll  begin  with  it  because  it  lays  the  foundation  for 

five  colors. 

Proof:    (Six  colors)    The  proof  will  be  by  induction  on  the  number  of  vertices  of  G. 

A  graph  with  at  most  six  vertices  can  obviously  be  properly  colored:  Give  each  vertex  a  different 
color.  Thus  we  can  assume  G  has  more  than  six  vertices.  We  can  also  assume  that  G  is  connected 
because  otherwise  each  component,  which  has  less  vertices  than  G,  could  be  properly  colored  by  the 
induction  hypothesis.  This  would  give  a  proper  coloring  of  G. 

Let  X  &  V  he  a,  vertex  of  G  with  smallest  degree.  By  Corollary  2,  d{x)  <  5.  Let  G  —  x  he  the 
graph  induced  by  V  —  {x}.  By  the  induction  hypothesis,  G  —  x  can  be  properly  6-colored.  Since  there 
are  at  most  5  vertices  adjacent  to  x  in  G,  there  must  be  a  color  available  to  use  for  x.  Q 

This  proof  works  for  proving  that  G  can  be  properly  5-colored  except  for  one  problem:  The 
induction  fails  if  x  has  degree  5  and  the  coloring  of  G  —  x  is  such  that  the  5  vertices  adjacent  to  x 
in  G  are  all  colored  differently.  We  now  show  two  different  ways  to  get  around  this  problem. 
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Proof:  (Five  colors,  first  proof)  As  noted  above,  we  may  assume  that  d{x)  =  5.  Label  the  vertices 
adjacent  to  x  as  yi, ...  ,1/5,  not  necessarily  in  any  particular  order.  Not  all  of  the  j/i's  can  be  joined 
by  edges  because  we  would  then  have  K5  as  a  subgraph  of  G  and,  by  Example  6.9,  K5  is  not  planar. 

Suppose  that  yi  and  y2  are  not  joined  by  an  edge.  Erase  the  edges  {x,yj}  from  the  picture  in 
the  plane  for  j  =  3,4,5.  Contract  the  edges  {x,yi}  and  {a;,  2/2}-  This  merges  x,  yi  and  2/2  into  a 
single  vertex  which  we  call  y.  We  now  have  a  graph  H  in  the  plane  with  two  less  vertices  than  G. 
By  induction,  we  can  properly  5-color  H.  Do  so. 

Color  all  vertices  of  G  using  the  same  coloring  as  for  H,  except  for  x,  yi  and  j/2,  which  are 
colored  as  follows.  Give  yi  and  y2  the  same  color  as  y.  Give  x  a  color  different  from  all  the  colors 
used  on  the  four  vertices  y,  j/3,  7/4  and  7/5.  (There  must  be  one  since  we  have  five  colors  available.) 
Since  H  was  properly  colored  and  {yi,  2/2}  is  not  an  edge  of  G,  we  have  properly  colored  G.  D 

Proof:  (Five  colors,  second  proof)  As  noted  above,  we  may  assume  that  d{x)  =  5.  Label  the 
vertices  adjacent  to  x  as  yi, . . . ,  2/5,  reading  clockwise  around  x.  Properly  color  the  subgraph  induced 
hy  V  —  X.  Let  Cj  be  the  color  used  for  y^.  As  noted  above  we  may  assume  that  the  c^'s  are  distinct, 
for  otherwise  we  coiild  simply  choose  a  color  for  x  different  from  ci , . . . ,  C5 . 

Call  this  position  in  the  text  HERE  for  future  reference.  Let  H  be  the  the  subgraph  of  G  —  a; 
induced  by  the  those  vertices  that  have  the  same  colors  as  yi  and  2/3.  Either  yi  and  ys  belong  to 
different  components  of  H  or  to  the  same  component.  We  consider  the  cases  separately. 

Suppose  yi  and  2/3  belong  to  different  components.  Interchange  ci  and  C3  in  the  component 
containing  yi.  We  claim  this  is  still  a  proper  coloring  of  G  —  a;.  Why?  First,  it  is  still  a  proper 
coloring  of  the  component  of  H  containing  j/i  and  hence  of  H.  Second,  the  vertices  not  in  H  are 
colored  with  colors  other  than  ci  and  C3,  so  those  vertices  adjacent  to  vertices  in  H  remain  properly 
colored.  We  have  reduced  this  situation  to  the  earlier  case  since  only  4  colors  are  now  used  for  the  yi. 

Suppose  that  yi  and  7/3  belong  to  the  same  component.  Then  there  is  a  path  in  H  from  yi  to 
2/3.  Add  the  edges  {x,yi}  and  {a;, 2/3}  to  the  path  to  obtain  a  cycle.  This  cycle  can  be  viewed  as 
dividing  the  plane  into  two  regions,  the  inside  and  the  outside.  Since  2/2  and  1/4  are  on  opposite  sides 
of  the  cycle,  any  path  joining  them  must  contain  a  vertex  on  the  cycle.  Since  all  vertices  on  the  cycle 
except  y  are  colored  ci  or  C3,  there  is  no  path  of  vertices  colored  C2  and  C4  in  G  —  x  joining  j/2  and  1/4. 
Now  go  back  to  HERE,  using  the  subscripts  2  and  4  in  place  of  1  and  3.  Since  2/2  and  2/4  will  belong 
to  different  components,  we  will  not  return  to  this  paragraph.  Thus,  the  proof  is  complete.  Q 
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6.3.6.  Let  G  be  the  simple  graph  with  V  =  7  and  edge  set 

E  =  {{1,2},  {1,3},  {1,4},  {1,7},  {2, 3},  {2, 4}, 

{2,  5},  {2,  6},  {2,  7},  {3, 4},  {4,  7},  {5, 6},  {6,  7}}. 

(a)  Embed  G  in  the  plane. 

(b)  The  vertices  of  V  have  a  natural  ordering.  Thus  a  function  specifying  a  coloring  of  V  can  be 
written  in  one-line  form.  In  this  one-line  form,  what  is  the  lexicographically  least  proper  coloring 

of  G  when  the  available  "colors"  arc  a,  b,  c,  d,  e  and  /? 

(c)  What  is  the  lexicographically  least  proper  coloring  of  G  using  the  colors  a,  b,  c  and  d? 
Notice  that  the  lexicographically  least  proper  coloring  in  (b)  uses  five  colors  but  there  is  another 
coloring  that  uses  only  four  colors  in  (c). 

6.3.7.  Prove  that  if  G  is  a  planar  graph  with  F  =  5,  then  the  lexicographically  least  proper  coloring  of  G 
using  the  colors  o,  6,  c,  d  and  e  uses  only  4  colors. 

6.3.8.  Suppose  V  =  &  and  the  available  colors  are  o,  6,  c,  d,  e.  Find  a  planar  graph  G  with  vertex  set  V 
such  that  the  lexicographically  least  proper  coloring  of  G  is  not  a  four  coloring  of  G. 
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*6.3.9.  What's  wrong  with  the  following  idea  for  showing  that  4  colors  arc  enough?  By  the  argument  for 
5  colors,  our  only  problem  is  a  vertex  of  degree  5  or  less  such  that  the  adjacent  vertices  require  4 
colors.  Use  the  idea  in  second  argument  in  the  text.  With  a  vertex  of  degree  4,  the  argument  can  be 
used  as  stated  in  the  text  with  all  mention  of  j/5  deleted. 

Now  suppose  the  vertex  has  degree  5.  The  argument  in  the  text  guarantees  that  yi, .  .  .  ,yc,  can 
be  colored  with  4  colors.  Do  so.  Wc  can  still  select  two  vertices  that  arc  colored  differently  and  arc 
not  adjacent  in  the  clockwise  listing.  Thus  the  argument  given  in  the  text  applies  and  we  can  reduce 
the  number  of  colors  needed  for  3/1, . . . ,  j/5  from  4  to  3. 

*6.3.10.    In  Exercise  6.3.5  we  looked  at  proper  embeddings  of  graphs  in  a  torus.  It  can  be  shown  that  every 
embedding  (proper  or  not)  of  a  graph  in  a  torus  can  be  properly  colored  using  at  most  seven  colors. 
Find  a  graph  embedded  in  a  torus  that  requires  seven  colors. 
Hint.  There  is  one  with  just  seven  vertices. 


*Algorithmic  Questions 


We  do  not  know  of  an  algorithm  for  4  coloring  a  planar  graph  in  a  reasonable  time.  The  first  proof 
of  the  Five  Color  Theorem  leads  to  a  reasonable  algorithm  for  coloring  a  planar  graph  with  5  colors. 
At  each  step,  the  number  of  vertices  is  decreased  by  2.  Once  the  reduced  graph  is  colored,  it  is  a 
simple  matter  to  color  the  given  graph:  Adjust  the  previous  coloring  in^  0^(1)  time  by  assigning 
colors  to  the  three  vertices  x,  yi  and  y2  that  were  merged  to  form  y.  The  time  required  is  0^(1).  If 
R{n)  is  the  maximum  time  needed  to  reduce  an  n  vertex  planar  graph,  then  The  total  time  is^ 

R{n)  +        -  2)  +  ■  •  ■  + 

where  the  0'^{n)  is  due  to  the  roughly  n/2  times  the  coloring  must  be  adjusted.  Note  that  i?(n) 
comes  primarily  from  finding  a  vertex  of  degree  5  or  less.  It  should  be  fairly  clear  to  you  that  R{n) 
is  0^{n)  and  so  the  time  for  coloring  is  0~^{n'^).  One  would  expect  that  using  a  sophisticated  data 
structure  to  represent  the  graph  would  allow  us  to  improve  this  time.  We  will  not  pursue  this  here. 

We  now  turn  our  attention  to  the  problem  of  deciding  if  a  graph  is  planar.  At  first  glance, 
there  is  no  obvious  algorithm  for  doing  this  except  to  use  Kuratowski's  Theorem.  This  involves  a 
tremendous  amount  of  computer  time  because  there  are  so  many  possible  choices  for  the  operations 
(a)-(c)  described  at  the  start  of  Section  6.3.  Because  of  this,  it  is  somewhat  surprising  that  there  are 
algorithms  with  worst  case  nmning  times  that  arc  0(|V^|).  We'll  examine  some  of  the  ideas  associated 
with  one  such  algorithm  here  but  will  not  develop  the  complex  data  structures  needed  to  achieve 
9(|y|)  running  time. 

To  check  if  a  graph  is  planar,  it  suffices  to  merge  multiple  edges  and  to  check  that  each  connected 
component  is  planar.  This  should  be  obvious.  A  bit  less  obvious  is  the  fact  that  it  suffices  to  check 
each  biconnected  component.  (This  was  defined  in  Example  6.4  (p.  154).)  To  see  this,  note  that  we 
can  begin  by  embedding  one  bicomponent  in  the  plane.  Suppose  that  some  subgraph  H  of  the  given 
graph  has  been  already  embedded.  Any  unembedded  bicomponent  C  that  shares  a  vertex  v  with  H 
can  then  be  embedded  in  a  face  of  H  adjacent  to  v  in  such  a  way  that  the  u  of  C  and  the  v  oi  H 
can  be  merged  into  one  vertex.  This  process  can  be  iterated.  Because  of  this,  we  limit  our  attention 
to  simple  biconnected  graphs. 


^  The  notation  for  0^{)  is  discussed  in  Appendix  B. 

^  Recall  that  f{n)  +  0 "*"(«)  means  f{n)  plus  some  function  that  is  in  0~^{n). 
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Definition  6.4  st-labeling  Let  G  =  {n,E)  be  a  simple  biconnected  graph  with  {s,t}  €  E, 
an  sf-labeling  of  G  is  a  permutation  Xofn  such  that 

(a)  A(s)  =  1, 

(b)  X{t)  =  n  and, 

(c)  whenever  1  <  \{v)  <  n,  there  are  vertices  u  and  w  adjacent  to  v  such  that 
X{u)  <  X{v)  <  X{w). 

We  will  soon  prove  that  such  a  vertex  labeling  exists  and  give  a  method  for  finding  it,  but  first, 
we  explain  how  to  use  it. 

Start  embedding  G  in  the  plane  by  drawing  the  edge  {s,t}.  Suppose  that  we  have  managed  to 
embed  the  subgraph  Hk  induced  by  the  vertices  with  X{v)  <  k  together  with  t.  If  A(a;)  =  k,  then 
we  must  embed  x  in  the  same  face  as  t.  Why  is  this?  By  (c),  there  exist  vertices  x  =  wi,W2,  ■  ■  ■  =  t 
such  that  {wijWi+i}  is  an  edge  and  X{wi)  is  an  increasing  function  of  i.  Since  none  of  these  vertices 
except  t  have  been  embedded,  they  must  ah  he  in  the  same  face  of  H/..  Thus  we  know  which  face 
of  Hk  to  put  X  in.  Unfortunately,  parts  of  our  embedding  of  Hk  may  be  wrong  in  the  sense  that  we 
cannot  now  embed  x  to  construct  Hk+i-  How  can  we  correct  that? 

The  previous  observation  implies  that  we  can  start  out  by  placing  the  vertices  of  G  in  the  plane 
so  that  V  is  at  (A(t;),0).  Correcting  the  problems  with  Hk  can  be  done  by  redrawing  edges  without 
moving  the  vertices.  There  are  systematic,  but  rather  complicated,  ways  of  finding  a  good  redrawing 
of  Hk  (or  proving  that  none  exists  if  the  graph  is  not  planar).  We  will  not  discuss  them.  For  relatively 
small  graphs,  this  can  be  done  by  inspection. 

We  now  present  an  algorithm  for  finding  an  st-labeling.  The  validity  of  the  algorithm  implies  that 
such  a  labeling  exists,  thus  completing  the  proof  of  our  planarity  algorithm.  We'll  find  an  injection 
A:  n  — >  M  that  satisfies  (a)-(c)  in  Definition  6.4.  It  is  easy  to  construct  an  st-labeling  from  such  a  A 
by  replacing  the  kth  smallest  value  of  A  with  k.  Suppose  that  we  specified  A  for  a  subgraph  Gz  of  G 
induced  by  some  subset  Z  of  the  vertices  of  G.  Wo  assume  that  s,t  €  Z  and  that  A  satisfies  (a)-(c) 
on  Gz-  Suppose  that  {x,y}  is  not  in  Gz-  Since  G  is  biconnected,  there  is  a  cycle  in  G  containing 
{x,y}  and  {s,t}.  This  means  that  there  are  disjoint  paths  in  G  either 

(i)  from  s  to  a;  and  from  t  to  y  or 

(ii)  from  s  to  y  and  from  t  to  x. 

If  (ii)  holds,  interchange  the  meanings  of  x  and  y  to  convert  it  to  (i).  Thus  we  may  assume  that  (i) 
holds. 

Let  the  paths  in  (i)  be  s  =  ui,  M2,  •  •  •  =  a;  and  t  =  wi,'W2,  ■  ■  ■  =  y-  Suppose  that  Ui  and  Wj  are 
the  last  vertices  in  Z  on  these  two  paths.  Let  Vk  be  the  fcth  vertex  on  the  path 

Ui,  Ui+i,  ...,x,y,.-.,  Wj+i ,  Wj  6.10 

and  let  m  be  the  total  number  of  vertices  on  the  path.  Except  for  Ui  and  Wj ,  none  of  the  vertices  in 
(6.10)  are  in  Z.  Add  them  to  Z  and  define  X{vk)  for  fc  7^  l,m  such  that  A  is  still  an  injection  and 
A(ufe)  is  monotonic  on  the  path.  In  other  words,  if  X{ui)  <  X{wj),  then  A  will  be  strictly  increasing 
on  the  path,  and  otherwise  it  will  be  strictly  decreasing.  Making  A  an  injection  is  easy  since  there 
are  an  infinite  number  of  real  numbers  between  X{ui)  and  X{wj).  We  leave  it  to  you  to  show  that 
this  function  satisfies  (c)  for  the  vertices  that  were  just  added  to  Z. 
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Example  6.10    Let  G  be  the  simple  graph  with  V  =  ']_  and  edge  set 

E  =  {{1,2},{1,3},{1,4},{1,7},{2,3},{2,4}, 

{2, 5},  {2, 6},  {2, 7},  {3, 4},  {4, 7},  {5, 6},  {6, 7}}. 

We'll  use  the  algorithm  to  construct  a  2,5-labcling  for  it.  The  labeling  A  will  be  written  in  two  line 
form  with  blanks  for  unassigned  values;  i.e.,  for  vertices  not  yet  added  to  Z.  To  begin  with  Z  =  {2,  5} 
and 

A   =    (12  3  4  5  6  7^_ 

Let  {x,  y}  =  {2, 6}.  In  this  case,  a  cycle  is  2,6,5.  Thus  s  =  u\  =  x,  t  =  wx  and  W2  =  y-  Hence  i  =  1, 
j  =  1  and  the  path  (6.10)  is  s  =  x,y,t  =  2, 6, 5.  We  must  choose  A(6)  between  1  and  7,  say  4.  Thus 

A  =  Cl^'U')- 

Let  {x,  y}  =  {3, 4}.  A  cycle  is  3, 4,  7, 6,  5,  2  and  the  path  (6.10)  is  2,  3, 4, 7, 6.  We  must  choose  A(a;), 
X{y)  and  A(7)  between  1  and  4  and  monotonic  increasing.  We  take 

\    _    ('123456  7\ 
\     1  2  3  7  4  3.5/  • 

Finally,  with  {x,  y},  a  cycle  is  2, 1, 4, 7, 6, 5  and  the  path  (6.10)  is  2,1,4.  We  take 

\    _    (  1    23456  7\ 
2.5  1  2  3  7  4  3.5-'  ■ 

Adjusting  to  preserve  the  order  and  get  a  permutation,  we  finally  have 

\    _    /123456  7\  n 
(^312476  5/''-' 

This  algorithm  for  deciding  planarity  can  be  adapted  to  produce  an  actual  embedding  in  the 
plane,  if  the  graph  is  planar.  In  practical  problems,  one  usually  imposes  other  constraints  on  the 
embedding.  For  example,  if  one  is  looking  at  pictures  of  graphs,  the  embedding  should  spread  the 
vertices  and  edges  out  in  a  reasonably  nice  fashion.  On  the  other  hand,  in  VLSI  design,  one  often 
assumes  that  the  maximum  degree  of  the  vertices  is  at  most  four  and  requires  that  the  edges  be  laid 
out  on  a  regular  grid.  Other  VLSI  layout  problems  involve  vertices  which  take  up  space  on  the  plane 
and  various  edge  constraints.  In  VLSI  design,  one  would  also  like  to  keep  the  edge  lengths  as  small 
as  possible.  A  further  complication  that  arises  in  VLSI  is  that  the  graph  may  not  be  planar,  but 
we  would  like  to  draw  it  in  the  plane  with  relatively  few  crossings.  These  are  hard  problems — often 
they  are  "NP-hard,"  a  term  discussed  in  Section  B.3  (p.  377). 


Exercises 


6.3.11.  Let  G  be  the  simple  graph  it  V  =  7  and  edge  set 

E  =  {{1,2},  {1,3},  {1,4},  {1,7},  {2,3},  {2, 4}, 

{2,  5},  {2,  6},  {2,  7},  {3,  4},  {4,  7},  {5,  6},  {6,  7}}. 

(a)  Use  the  algorithm  in  the  text  to  construct  a  1,  7-labeling  for  G. 

(b)  Use  the  algorithm  in  the  text  to  construct  a  7, 1-labeling  for  G. 

6.3.12.  Construct  an  st- labeling  for  K^.  (Since       is  completely  symmetric,  it  doesn't  matter  what  vertices 
you  choose  for  s  and  t.) 

6.3.13.  Construct  an  sf- labeling  for  ^^3,3.  (Again,  the  symmetry  guarantees  that  it  doesn't  matter  which 
edge  is  chosen  for  {s,t}.) 
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Figure  6.4   The  adjacency  Usts  of  a  simple  graph  for  Exercise  6.3.16. 

6.3.14.  We  return  to  Exercise  5.5.11:  Suppose  that  G  is  a  connected  graph  with  no  isthmuses.  We  want  to 
prove  that  the  edges  of  G  can  be  directed  so  that  the  resulting  directed  graph  is  strongly  connected. 

(a)  Suppose  that  G  is  biconnected  and  has  at  least  two  edges.  Use  si-labelings  to  prove  that  the 
result  is  true  in  this  case. 

(b)  Prove  if  a  graph  has  no  isthmuses,  then  every  bicomponent  has  at  least  two  edges. 

(c)  Complete  the  proof  of  this  exercise. 

6.3.15.  Prove  that  a  graph  is  biconnected  if  and  only  if  it  has  an  st-labeling. 

Hint.  Given  any  edge  different  from  {s,  t},  use  the  properties  of  an  st-labeling  to  find  a  cycle  containing 
that  edge  and  {s,i}. 

*6.3.16.  G  =  (30,  E)  is  a  simple  graph  given  by  Figure  6.4.  Row  k  lists  the  vertices  of  G  that  are  adjacent  to 
k.  Embed  G  in  the  plane  using  the  algorithm  described  in  the  text.  Produce  a  5  coloring  of  G  and, 
if  you  can,  make  it  a  4  coloring. 


6.4   Flows  in  Networks 


We  now  discuss  "flows  in  networks,"  an  application  of  directed  graphs.  Examples  of  this  concept 
are  fluid  flowing  in  pipes,  traffic  moving  on  freeways  and  telephone  conversations  through  telephone 
company  circuits.  We'll  use  fluid  in  pipes  to  motivate  and  interpret  the  mathematical  concepts. 

The  Concepts 


In  Figure  6.5  we  see  a  simple  directed  graph.  Note  that  its  edges  are  labeled;  e.g.,  the  directed  edge 
{Di,Pi)  has  label  2.  Imagine  that  the  directed  edges  of  the  graph  represent  pipes  through  which 
a  fluid  is  flowing.  A  label  on  an  edge  represents  the  rate  (measured  in  liters  per  second)  at  which 
the  fluid  is  flowing  along  the  pipe  represented  by  that  edge.  We  denote  this  flow  rate  function  by  /. 
Thus,  /(I?i,  Pi)  =  2  liters/sec  is  an  example  of  a  value  of  /  in  Figure  6.5. 
The  vertices  V  in  Figure  6.5  are  divided  into  two  classes, 

V  =  {Di,D2,D3,D4}  and  P  =  {Pi,  Pa,  ^3,  ^k}. 

Think  of  the  vertices  in  V  as  depots  and  those  in  V  as  pumps.  Fluid  can  enter  or  leave  the  system 

at  a  depot  but  not  at  a  pump.  This  corresponds  to  practical  experience  with  pumps:  If  the  rate  at 
which  fluid  is  flowing  into  a  pump  exceeds  that  at  which  it  is  flowing  out,  the  pump  will  rupture, 
while,  if  the  inflow  is  less  than  the  outflow,  the  pump  must  be  creating  fluid. 
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We  now  associate  with  each  vertex  v  G  V  a  number  bf{v)  which  measures  the  balance  of  fluid 

flow  at  the  vertex:  bf{v)  equals  the  sum  of  all  the  flow  rates  for  pipes  out  of  v  minus  the  sum  of 
all  the  flow  rates  for  pipes  into  v.  By  the  previous  paragraph,  bf{v)  =  0  if  v  G  V.  In  Figure  6.5, 
the  nonzero  values  of  6/  are  written  by  the  depots.  The  fact  that  6/(1)2)  =  +2  means  that  we  must 
constantly  pump  fluid  into  D2  from  outside  the  system  at  a  rate  of  2  liters/sec  to  keep  the  flow 
rates  as  indicated.  Likewise,  we  must  constantly  extract  fluid  from  D4  at  the  rate  of  4  liters/sec  to 
maintain  stability. 

It  is  useful  to  summarize  some  of  the  above  ideas  with  a  precise  definition.  Note  that  we  do  not 
limit  ourselves  to  directed  graphs  which  are  simple. 

Definition  6.5    Flow  in  a  digraph     Let  G  =  {V,E,<p)  be  a  directed  graph.  For  v  G  V, 

define 

IN(u)  =  {e  G  E  :  ip{e)  =  {x,  v)  for  some  x  gV} 

and 

OUT(t;)  =  {eG  E  :  <p{e)  =  {v,  y)  tor  some  y  G  V}. 

Let  f  be  a  function  from  E  to  the  nonnegative  real  numbers;  i.e.,  f:E  — >  M"*".  Define  bf.V  ^M. 
by 

bf{v)  =     E   /(e)  -  E  /(e)- 

eeOUT(t;)  eeIN(i;) 

Let  {V,  V)  be  an  ordered  partition  ofV  into  two  sets.  The  function  f  will  be  called  a  flow  with 
respect  to  this  partition  ifbf{v)  =  0  for  all  v  G  P.  We  call  the  function  bf  the  balance  of  the 
Bow  f. 

You  may  have  noticed  that  our  discussion  of  flows  in  networks  is  missing  something  important, 
namely  the  capacities  of  the  pipes  to  carry  fluid.  In  Figure  6.6,  we  have  included  this  information. 
Attached  to  each  edge  is  a  dotted  semicircle  containing  the  maximum  amount  of  fluid  in  liters/sec 
that  can  flow  through  that  pipe.  This  is  the  capacity  of  the  edge  (pipe)  and  is  denoted  by  c;  e.g., 
c(Pi,  Da)  =  6  in  Figure  6.6.  The  capacity  c  is  a  function  from  E  to  the  set  of  positive  real  numbers. 
We  are  interested  in  flows  which  do  not  exceed  the  capacities  of  the  edges.  Realistically,  it  would 
also  be  necessary  to  specify  capacities  for  the  pumps  and  depots.  We  will  do  that  in  the  exercises, 
but  for  now  we'll  assume  they  have  a  much  larger  capacity  than  the  pipes  and  so  can  handle  any 
flow  that  the  pipes  can. 
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bfiDi)  =  +3 


Figure  6.6  The  network  in  Figure  6.5  with  capacities  c  (in  dotted  semicircles),  the  flow  from  Figure  6.5, 
and  another  admissible  flow  p  (in  parentheses) . 


The  set  T>  will  now  be  divided  into  two  subsets  called  the  sources,  where  fluid  can  enter  the 
network,  and  the  sinks,  where  fluid  can  leave  the  network.  Our  goal  is  to  maximize  the  rate  at  which 
fluid  passes  through  the  network;  that  is  to  maximize  the  sum  of  bf{v)  over  all  sources.  We'll  present 
a  formal  definition  and  then  look  at  how  it  applies  to  Figure  6.6.  Since  we  will  spend  some  time 
working  with  Figure  6.6,  you  may  find  it  useful  to  make  some  copies  of  it  for  scratch  work. 

Definition  6.6  Some  network  flow  terminology  Let  G  =  {V.E,ip)  be  a  directed  graph. 
Let  c  be  a  function  from  E  to  the  nonnegative  reals,  called  the  capacity  function  for  G.  Let  f 
be  a  flow  function  on  the  directed  graph  G  =  (V,  E,  ip)  with  vertex  partition  {V,  V)  and  balance 
function  bf,  as  defined  in  Definition  6.5.  The  flow  f  will  be  called  admissible  with  respect  to 
the  capacity  function  c  if  /(e)  <  c(e)  for  all  edges  e.  Let  (I?iin2?out)  be  an  arbitrary  ordered 
partition  of  T>  into  two  nonempty  sets.  We  call  Vin  the  set  of  source  vertices  for  this  partition 
and  r>out  the  set  of  sink  vertices.  We  define  the  value  of  f  with  respect  to  this  partition  to  be 

value(/)  =  ^/(^)- 

An  admissible  flow  f  will  be  called  maximum  with  respect  to  (2?in,  Vout)  H value(5)  <  value(/) 
for  all  other  admissible  flows  g. 

In  general,  the  partitioning  you  are  given  of  the  set  V  of  depots  into  the  two  subsets  Vin  and  Pout  is 
completely  arbitrary.  Once  this  is  done,  the  two  sets  will  be  kept  the  same  throughout  the  problem 
of  maximizing  vahic(/).  It  is  sometimes  convenient  to  write  (G,  c,  (Pin,  Pout))  to  refer  to  the  graph 
G  with  the  capacity  function  c  and  the  "source-sink"  partition  (Pin, Pout)-  Our  basic  problem  is, 
given  {G,c,  (Pin,'C'out)),  find  an  admissible  flow  that  is  maximum. 

In  Figure  6.6  let  {Di,D2}  =  Pin  and  {^3,^4}  =  Pout-  Intuitively,  value(/)  is  the  amount  of 
fluid  in  liters/sec  that  must  be  added  to  Di  and  D2  to  maintain  the  flow  /.  (The  same  amount 
overflows  at  D3  and  D4.)  You  have  to  be  careful  though!  If  you  just  pick  some  admissible  flow  f 
without  worrying  about  maximizing  value(/),  you  might  pick  one  with  value(/)  <  0,  in  which  case 
fluid  will  have  to  be  extracted  from  the  source  Pi„  to  maintain  the  flow.  It  is  only  for  the  flow  / 
that  maximizes  value(/)  that  we  can  be  sure  that  fluid  is  added  to  the  source  vertices  (or  at  least 
not  extracted). 
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An  Algorithm  for  Constructing  a  Maximum  Flow 


Now  we'll  study  Figure  6.6  more  carefully  to  help  us  formulate  an  algorithm  for  finding  an  admissible 
flow  function  /  which  maximizes  value(/)  =  bf{Di)  +  bf{D2),  where  6/  is  the  balance  function  of 
the  function  /. 

In  addition  to  the  capacities  shown  on  the  edges  of  Figure  6.6  there  are  two  sets  of  numbers. 
One  number  has  parentheses  around  it,  such  as  (4)  on  the  edge  (£)2,P3).  The  other  number  does 
not  have  parentheses,  such  as  1  on  (D2,P3).  The  parenthesized  numbers  define  a  flow,  which  we 
call  p  for  "parentheses,"  and  the  other  numbers  define  a  fiow  which  we  call  /.  Referring  to  the  edge 
e  =  {D2,P3),  /(e)  =  1  and  p{e)  =  4.  You  should  check  that  /  and  p  satisfy  the  definitions  of  a 
fiow  with  respect  to  the  depots  V  and  pumps  V.  Computing  the  values  of  /  and  p  with  respect  to 
I?in  =  {-Di,  -D2}  we  obtain  value(/)  =  3  +  2  =  5  and  value(j3)  =  5  +  5  =  10.  Thus  p  has  the  higher 
value.  In  fact,  we  will  later  prove  that  p  is  a  flow  of  maximum  value  with  respect  to  the  set  of  sources 
I?i„. 

To  begin  with,  concentrate  on  the  flow  /.  Go  to  Figure  6.6  and  follow  the  edges  connecting  the 
sequence  of  vertices  (£>2,  -P3,  -P4,  -Ds)-  They  form  an  (undirected)  path  on  which  you  first  go  forward 
along  the  edge  (1)2,^3),  then  backwards  along  the  edge  (P4,P3),  then  backwards  along  the  edge 
(£'3,P4).  Note  that  for  each  forward  edge  e,  /(e)  <  c(e)  and  for  each  backward  edge  e,  /(e)  >  0. 
These  conditions,  /(e)  <  c(e)  on  forward  edges  and  /(e)  >  0  on  backward  edges  are  very  important 
for  the  general  case,  which  wc  discuss  later. 

For  each  forward  edge  e,  define  (5(e)  =  c(e)  —  /(e).  For  each  backward  edge  e,  define  (5(e)  =  /(e). 
In  our  particular  path  we  have  ^(P'25P3)  =  ^(P4,P3)  =  ^(-D3,P4)  =  2.  Let  6  denote  the  minimum 
value  of  (5(e)  over  all  edges  in  the  path.  In  our  case  6  =  2.  We  now  define  a  new  flow  g  based  on  /, 
the  path,  and  6: 

•  For  each  forward  edge  e  on  the  path,  add  S  to  /(e)  to  get  g{e). 

•  For  each  backward  edge  e  on  the  path,  subtract  5  from  /(e)  to  get  g{e). 

•  For  all  edges  e  not  on  the  path,  g{e)  =  /(e). 

This  process  is  called  "augmenting  the  flow"  along  the  path.  You  should  convince  yourself  that  in 
our  example,  value(5f)  =  value(/)  +  6  =  value(/)  +  2.  This  type  of  relation  will  be  true  in  general. 

If  you  now  study  Figure  6.6,  you  should  be  able  to  convince  yourself  that  there  is  no  path  from 
Vin  to  I?out  along  which  p  can  be  augmented.  This  observation  is  the  key  to  proving  that  p  is  a 
maximum  flow:  We  will  see  that  maximum  flows  are  those  which  cannot  be  augmented  in  this  way. 

We  are  now  ready  for  some  definitions  and  proofs.  The  next  definition  is  suggested  by  our 
previous  discussion  of  augmenting  flows. 

Definition  6.7    Let  /  be  an  admissible  Bow  function  for  {G,c,  (VinjVout))-  Suppose  that 

(wi,  U2,  Wfe)  is  an  undirected  path  in  G  with  vi  €  and  Vi  ^  for  i  >  1.  If  for  each 
forward  edge  e  in  this  path,  /(e)  <  c(e)  and  for  each  backward  edge  e,  /(e)  >  0  then  we 
say  that  the  path  is  augmentable.  If  in  addition  Vk  G  "Cout  then  we  say  that  the  path  is  a 
complete  augmentable  path.  Let  (5(e)  =  f{e.)  if  e.  is  a  backward  edge  on  the  path  and  let 
S{e)  =  c(e)  —  /(e)  if  e  is  a  forwarci  ecige.  The  minimum  value  of  S{e)  over  all  edges  of  the  path, 
denoted  by  6,  will  be  called  the  increment  of  the  path.  Let  A{f)  be  those  vertices  that  lie  on 
some  augmentable  path  of  f,  together  with  all  the  vertices  in  T>in. 

In  Figure  6.6,  with  respect  to  the  admissible  flow  /,  the  path  (Di, Pi, P3)  is  augmentable.  The  path 
(£>2,  P2,  P3,  P4,  -D3)  is  a  complete  augmentable  path.  So  is  the  path  {D2,  P3,  P4,  £^3). 

Theorem  6.6  Augmentable  Path  Theorem  A  Bow  f  is  a  maximum  Bow  if  and  only  if 
it  has  no  complete  augmentable  path;  that  is,  if  and  only  if  A{f)  fl  'Dout  =  0- 
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Proof:  If  such  a  complete  augmentable  path  exists,  use  it  to  augment  /  by  S  and  obtain  a  new  flow 
p.  Since  the  first  vertex  on  the  path  lies  in  Vi^,  it  follows  that  value(p)  =  value(/)  +  6  >  value(/). 
Therefore  /  was  not  a  maximum  flow. 

Now  suppose  that  no  complete  augmentable  path  exists  for  the  flow  /.  Let  A  =  A{f)  and 
B  =  V  —  A.  We  will  now  consider  what  happens  to  flows  on  edges  between  A  and  B.  For  this 
purpose,  it  will  be  useful  to  have  a  bit  of  notation.  For  C  and  D  subsets  of  V,  let  FROM(C,  D)  be 
all  e  G     with  (^(e)  €C  x  D;  i.e.,  the  edges  from  C  to  D.  We  claim: 

1.  For  any  flow  g, 

value(5f)    =       ^      g{e)     -       ^      g{e).  6.11 

eeFROM(A,B)  eeFROM(B,A) 

2.  For  the  flow  /,  if  e  e  FROM(^,  B)  then  /(e)  =  c(e),  and  if  e  e  FROM(B,  A)  then  /(e)  =  0. 

The  proofs  of  the  claims  are  left  as  exercises.  Suppose  the  claims  are  proved.  Since  0  <  g{e)  <  c(e) 
for  any  flow  g,  it  follows  that 

value(5)    =  gie)  -       ^  g{e) 

eeFROM(A,B)  eeFROM(S,A) 

<    E  E  0 

eeFROM(A,B)  eeFROM(S,A) 

=  value(/). 

Since  g  was  any  flow  whatsoever,  it  follows  that  /  is  a  maximum  flow.  Q 


This  theorem  contains  the  general  idea  for  an  algorithm  that  computes  a  maximum  flow  for 
{G,  c,  (I^in,  I'out))-  The  first  thing  to  do  is  to  choose  an  admissible  flow  /.  The  flow  /(e)  =  0  for  all 
e  will  always  do.  Usually,  by  inspection,  you  can  do  better  than  that.  The  general  idea  is  that,  given 
a  flow  /  such  that  A{f)  fl  I?out  7^  0,  we  can  find  a  complete  augmentable  path  and  use  that  path  to 
produce  a  flow  of  higher  value.  Here's  the  procedure 

/*  The  main  procedure  */ 
Procedure  maxflow 

Set  /(e)  =  0  for  all  e. 

While  A{f  )  n  Pout  7^  0,  augment  (/)  . 

Return  /. 

End 


/*  Replace  /  with  a  bigger  flow.  */ 
Procedure  augment (/) 

Find  a  complete  augmentable  path  (vi,V2,  ■■■,Vk)  ■ 

Compute  the  increment  6  of  this  path. 

If  e  is  a  forward  edge  of  the  path,  set  f{e)  =  f{e)  +  S. 

If  e  is  a  backward  edge  of  the  path,  set  f{e)  =  f{e)  —  S. 

Return  /. 

End 

We  have  left  it  up  to  you  to  do  such  nontrivial  things  as  decide  if  A{f)  fl  Pout  is  empty  and  to 
find  a  complete  augmentable  path.  In  the  examples  and  exercises  in  this  book  it  will  be  fairly  easy 
to  do  these  things.  On  larger  scale  problems,  the  efficiency  of  the  algorithms  used  for  these  things 
can  be  critical.  This  is  a  topic  for  a  course  in  data  structures  and/or  algorithm  design  and  is  beyond 
the  scope  of  this  book. 
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Figure  6.7    The  network  for  Exercise  6.4.1. 


It  is  possible  that,  like  Zeno's  paradox,  our  algorithm  will  run  forever:  augment  could  simply 
produce  a  flow  /  with  value(/)  halfway  between  the  value  of  the  flow  it  was  given  and  the  value  of 
a  maximum  flow.  In  fact,  the  algorithm  will  always  stop.  We'll  prove  a  weaker  version: 

Theorem  6.7  Integer  Flow  Theorem  If  the  capacities  of  a  network  are  all  integers,  then 
the  maxf  low  algorithm  stops  after  a  finite  number  of  steps  and  produces  a  maximum  How  which 
assigns  integer  Hows  to  the  edges. 

Proof:    We  claim  that  the  calculations  in  the  algorithm  involve  only  integer  values  for  /  and  S. 

This  can  be  proved  by  induction:  Before  any  iterations,  /  is  an  integer  valued  function.  Suppose 
that  we  call  augment(/)  with  an  integer  valued  /.  Since  5  is  a  minimum  of  numbers  of  the  form  /(e) 
and  c(e)  —  /(e),  which  are  all  integers,  (5  is  a  positive  integer.  Thus  the  new  /  is  integer  valued  and 
has  a  value  at  least  one  larger  than  the  old  /.  Thus,  after  n  steps,  value(/)  >  n.  If  a  maximum  flow 
has  value  F,  then  a  maximum  flow  is  reached  after  at  most  F  steps.  Q 

Although  this  algorithm  stops,  it  is  a  poor  algorithm.  Quite  a  bit  of  work  has  been  done  on 
finding  fast  network  flow  algorithms.  Unfortunately,  improvements  usually  lead  to  more  complex 
algorithms  that  use  more  complicated  data  structures.  One  easy  improvement  is  to  use  a  shortest 
complete  augmentable  path  in  augment.  This  leads  to  an  easily  programmed  algorithm  which  often 
runs  fairly  quickly.  Another  poor  feature  of  our  algorithm  lies  in  the  fact  that  all  the  calculations 
needed  to  find  an  augmentable  path  are  thrown  away  after  the  path  has  been  found.  With  just  a 
little  more  work,  one  may  be  able  to  do  several  augmentations  at  the  same  time.  The  worst  case 
behavior  of  the  resulting  algorithm  is  good,  namely  0(|y|^).  Unfortunately,  it  is  rather  complicated 
so  we  will  not  discuss  it  here. 
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Figure  6.8   Converting  pump  capacity  to  edge  capacity  for  Exercise  6.4.4. 


Exercises 


6.4.1.  The  parts  of  this  problem  all  refer  to  Figure  6.7. 

(a)  With  ©in  =  {v,  w,  X,  y,  z)  and  ©out  =  {a,  b,  c,  d,  e},  find  a  maximum  flow.  Also,  flnd  a  minimum 
cut  set  and  verify  that  its  capacity  is  equal  to  the  value  of  your  maximum  flow. 

(b)  Returning  to  the  previous  part,  flnd  a  different  maximum  flow  and  a  different  minimum  cut 
set. 

(c)  With  Din  =  {v}  and  ©out  =  {e},  find  a  ma^ximum  flow.  Also,  find  a  minimum  cut  set  and  verify 
that  its  capacity  is  equal  to  the  value  of  your  maximum  flow. 

(d)  In  the  previous  part,  is  the  maximum  flow  unique?  Is  the  minimum  cut  set  unique? 

6.4.2.  Prove  Claim  2  in  the  proof  of  the  Augmcntable  Path  Theorem. 

Hint.  Prove  that  /(e)  <  c(e)  for  e  €  FROM(^,  B)  implies  that  the  ends  of  e  are  in  A,  a  contradiction. 
Do  something  similar  for  FROM(B,^). 

6.4.3.  Prove  (6.11). 

Hint.  Prove  that  va\ue{g)  =  X^wgA  ^(^)  ^^'^  iii^a-i  each  edge  e  with  both  ends  in  A  contributes  both 

c(e)  and  -c(e)  to  Et,eA^(")- 

6.4.4.  We  didn't  consider  the  capacities  of  the  pumps  in  dealing  with  Figure  6.6.  Essentially,  we  assumed 
that  the  pumps  could  handle  any  flow  rates  that  might  arise.  Suppose  that  pumps  Pi,  P2  and 
P3  are  having  mechanical  problems  and  can  only  pump  3  liters/sec.  What  is  a  maximum  flow  for 
(G,c,  (©in, ^'out))  with  ©in  =  {1)1,1)2}  and  ©out  =  {03,04,}?  Figure  6.8  shows  how  to  convert  a 
pump  capacity  into  an  edge  capacity. 

6.4.5.  Consider  again  the  network  flow  problem  of  Figure  6.6.  The  problem  was  defined  hy  N  — 
(G,  c,  (©in,  ©out))  where  ©in  ~  {Di,D2}  and  ©out  =  {0^,04}.  Imagine  that  two  new  depots  are 
created  as  shown  in  Figure  6.9,  and  all  of  the  original  depots  are  converted  into  pumping  stations. 
Let  A'^'  =  (G',c',  (©In,  ©out))  denote  this  new  problem,  where  ©|n  =  {Da}  and  ©out  =  {O5}. 

(a)  What  are  the  smallest  values  of  c'{Do,P{)  =  c'l,  c'{Dq,P2)  =  C2,  c'{P^,D5)  =  C3  and 
c'(P4,i)5)  =  C4  that  guarantees  that  if  p'  is  a  maximum  flow  for  N'  then  p'  restricted  to 

the  edges  of  G  is  a  maximum  flow  for  A'^?  Explain. 

(b)  With  your  choices  of  c^,  will  it  be  true  that  any  maximum  flow  on  N  can  be  used  to  get  a 
maximum  flow  on  N'?  Explain. 

*Cut  Partitions  and  Cut  Sets 


The  "Max-Flow  Min-Cut  Theorem"  is  closely  related  to  our  augmentable  path  theorem;  however, 
unlike  that  theorem,  it  does  not  lead  immediately  to  an  algorithm  for  finding  the  maximum  flow. 
Instead,  its  importance  is  primarily  in  the  theoretical  aspects  of  the  subject.  It  is  an  example  of 
a  "duality"  theorem,  many  of  which  are  related  to  one  another.  If  you  are  familiar  with  linear 
programming,  you  might  like  to  know  that  this  duality  theorem  can  be  proved  from  the  linear 
programming  duality  theorem.  We  need  a  definition. 
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Figure  6.9   New  depots  for  Exercise  6.4.5. 


Definition  6.8  Cut  partition  Given  {G,c,{Vin, Vout)),  any  ordered  partition  {A,B)  of  V 
with  "Din  C  A  and  ©out  C  B  will  be  called  a  cut  partition.  A  cut  set  is  a  subset  F  of  the  edges 

of  G  such  that  every  directed  path  from  Din  to  2?out  contains  an  edge  of  F.  If  F  is  a  set  of  edges 
in  G,  the  sum  of  c(e)  over  all  e  €  F  is  called  the  capacity  of  F.  If  {A,  B)  is  a  cut  partition,  we 
write  c{A,  B)  instead  of  c(FROM(A,  B))  and  call  it  the  capacity  of  the  cut  partition. 

The  following  lemma  shows  that  cut  partitions  and  cut  sets  are  closely  related. 

Lemma  If  (A,  B)  is  a  cut  partition,  FROM(yl,  B)  is  a  cut  set.  Conversely,  if  F  is  a  cut  set, 
then  there  is  a  cut  partition  {A,  B)  with  FROM(A,  B)  C  F. 

Proof:    This  is  left  as  an  exercise.  D 

Theorem  6.8  Max-Flow  Min-Cut  Theorem  Let  /  be  any  flow  and  {A,B)  any  cut 
partition  for  {G,  c,  (X>in,  T^out))-  Then 

value(/)  <  c{A,  B) 

and,  if  f  is  a  majcimum  flow,  then  there  is  a  cut  partition  {A,  B)  such  that  value(/)  =  c{A,  B). 
The  results  are  valid  if  we  replace  the  cut  partition  {A,  B)  with  the  cut  set  F. 

Proof:    The  inequality  value(/)  <  c{A,B)  follows  immediately  from  (6.11)  and  the  fact  that  / 

takes  on  only  nonncgativc  values.  Suppose  that  /  is  a  maximum  flow.  Let  A  =  A{f)  (and,  therefore, 
B  =  V  —  A{f)).  It  follows  from  the  claim  following  (6.11)  that  value(/)  =  c{A,B).  To  change  from 
cut  partition  to  cut  set,  apply  the  lemma.  Q 

Why  is  this  called  the  Max-Flow  Min-Cut  Theorem?  The  inequality  value(/)  <  c{A,  B)  implies 
that  the  maximum  value(/)  over  all  possible  admissible  flows  /  is  less  than  or  equal  to  minimum 
value  of  c{A,B)  over  all  possible  cut  partitions  {A.B).  The  fact  that  equality  holds  for  maximum 
flows  and  certain  cut  partitions  says  that  "The  maximum  value  over  all  flows  is  equal  to  the  minimum 
capacity  over  all  cut  partitions." 
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the  set  S: 


the  set  /: 


c=  1 


c=  1 


c  =  M 


Figure  6.10  The  network  for  n  =  4,  Ai  =  {si,S2},  A2  =  {81,83,34},  A3  =  {51,55}  and  A4  =  {53,55}. 
Capacities  appear  on  the  left  side. 


Example  6.11   Systems  of  distinct  representatives  Let  5  be  a  finite  set  and  suppose  that 

Ai  C  S  for  1  <  i  <  n.  A  list  ai, . . . ,  a„  is  called  a  system  of  representatives  for  the  Aj's  if  G  Ai 
for  all  i.  If  the  ai's  are  distinct,  we  call  the  list  a  system  of  distinct  representatives  for  the  Ai's. 

Systems  of  distinct  representatives  are  useful  in  a  variety  of  situations.  A  classical  example  is  the 
marriage  problem:  There  are  n  tasks  and  a  set  5*  of  resources  (e.g.,  employees,  computers,  delivery 
trucks).  Each  resource  can  be  used  to  carry  out  some  of  the  tasks;  however,  we  must  devote  one 
resource  to  each  task.  If  Ai  is  the  set  of  resources  that  could  carry  out  the  ith  task,  then  a  system 
of  distinct  representatives  is  an  assignment  of  resources  to  tasks.  An  extension  of  this,  which  we  will 
not  study,  assigns  a  value  for  each  resource-task  pair.  A  higher  value  means  a  greater  return  from 
the  resource-task  pairing  due  to  increased  speed,  capability,  or  whatever.  The  assignment  problem 
asks  for  an  assignment  that  maximizes  the  sum  of  the  values. 

We  will  use  the  Max-Flow  Min-Cut  Theorem  to  prove  the  following  result,  which  is  also  called 
the  Philip  Hall  Theorem  and  the  SDR  Theorem. 

Theorem  6.9  Marriage  Theorem  With  the  notation  given  above,  a  system  of  distinct 
representatives  (SDR)  exists  if  and  only  if 


for  every  subset  of  indices  I  C  n.  In  other  words,  every  collection  of  Ai 's  contains  at  least  as 
many  distinct  aj 's  as  there  are  Ai 's  in  the  collection. 

Proof:  By  renaming  the  elements  of  S  if  necessary,  we  may  assume  that  S  contains  no  integers  or 
DO.  Let  G  be  the  simple  digraph  with  V  =  S  UnU  {0,  oo}  and  edges  of  three  kinds: 

•  (0,  s)  for  all  s  G  S; 

•  (i,  oo)  for  all  i  G  n; 

•  {s,i)  for  all  s  G  S*  and  i  E  n  such  that  s  <E  Ai. 

Let  all  edges  of  the  form  (s,i)  have  capacity  M,  a  very  large  integer  and  let  all  other  edges  have 
capacity  1.  Let  I?in  =  {0}  and  Vout  =  {oo}-  Such  a  network  is  shown  in  Figure  6.10. 

Consider  a  flow  /  which  is  integer  valued.  Since  a  vertex  s  G  S*  has  one  edge  directed  in  and 
that  edge  has  capacity  1,  the  flow  out  of  s  cannot  exceed  1.  Similarly,  since  a  vertex  z  G  n  has  one 
edge  directed  out  and  that  edge  has  capacity  1,  the  flow  into  i  cannot  exceed  1.  It  also  follows  that 
/  takes  on  only  the  values  0  and  1. 

We  can  interpret  the  edges  that  have  /(e)  =  1: 

•  /(O,  s)  =  1  for  s  G  5  means  s  is  used  as  a  representative; 

•  f{i,  oo)  =  1  for  i  G  n  means  Ai  has  a  representative; 
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•  /(s,  z)  =  1  for  s  e  S*  and  i  G  n  means  s  is  the  representative  of  Ai. 

You  should  convince  yourself  that  this  interpretation  provides  a  bijection  between  integer  valued  flows 
/  and  systems  of  distinct  representatives  for  some  of  the  ^j's,  viz.,  those  for  which  f{i,  oo)  =  1.  To 
do  this,  note  that  for  a  given  s  G  S,  f{s,i)  =  1  for  at  most  one  i  €  n  because  the  flow  into  s  is  at 
most  1.  Experiment  with  integer  flows  and  partial  systems  of  distinct  representatives  in  Figure  6.10 
to  clarify  this. 

From  the  observation  in  the  previous  paragraph,  value(/)  is  the  number  of  sets  for  which  distinct 
representatives  have  been  found.  Thus  a  system  of  distinct  representatives  is  associated  with  a  flow 
/  with  value(/)  =  n.  If  we  can  understand  what  a  minimum  capacity  cut  set  looks  like,  we  may  be 
able  to  use  the  Max-Flow  Min-Cut  Theorem  to  complete  the  proof. 

What  can  we  say  about  a  minimum  capacity  cut  set  F?  Note  that  F  contains  no  edges  of 
the  form  {s,i)  because  of  their  largo  capacity.  Thus  c(e)  =  1  for  all  e  €  F  and  so  c{F)  =  \F\. 
Consequently,  we  are  concerned  with  the  minimum  of  |F|  over  all  cut  sets  F  containing  no  edges  of 
the  form  (s,  i).  Thus  F  contains  edges  of  the  form  (0,  s)  and/or  (i,  oo). 

Let  I  be  those  i  G  n  such  that  (i,  oo)  ^  F.  What  edges  of  the  form  (0,  s)  are  needed  to  form  a 

cut  set?  li  i  E  n  and  s  G  Ai,  then  we  must  have  an  edge  from  the  path  0,  s,  i,  oo  in  the  cut  set.  Thus, 

i  G  I  impHes  that  (0,  s)  G  F.  It  follows  that  (0,  s)  G  F  for  every  s  G  [j  Ai. 

iei 

This  is  enough  to  form  a  cut  set:  Suppose  0,  s,  i,  oo  is  a  path.  If  i  ^  /,  then  {i,  oo)  G  F.  If  i  G  I, 
then  (0,  s)  G  [j  Ai  C  F. 

What  is  |F|  in  this  case?  (Figure  6.10  may  help  make  the  following  discussion  clearer.)  We  have 
n  —  \I\  edges  of  the  form  {i,  oo)  and    IJ  Ai  edges  of  the  form  (0,  s).  Thus  \F\  is  the  sum  of  these 

and  so  the  minimum  capacity  is 


min{||J  Ai| +n- |/|}  =  n  +  min{||J  Ai|  -  |/|}.  6.13 


iei  ~  iei 


By  the  Max-flow  Min-cut  Theorem,  a  system  of  distinct  representatives  will  exist  if  and  only  if  this  is 
at  least  n.  Consequently,  the  expression  in  the  right  hand  set  of  braces  of  (6.13)  must  be  nonnegative 
for  all  7  C  n.  □ 


Exercises 


6.4.6.  Prove  the  lemma  about  cut  partitions. 

*6.4.7.  Prove  that  for  a  given  max-flow  problem,  A{f  )  is  the  same  for  all  maximum  flows  /. 

6.4.8.  For  r  <  n,  and  r  x  n  Latin  rectangle  is  an  r  x  n  array  in  which  each  row  is  a  permutation  of  n 
and  each  columu  contains  no  element  more  than  once.  If  r  =  n,  each  column  must  therefore  be  a 
permutation  of  n.  Such  a  configuration  is  called  a  Latin  square.  The  goal  of  exercise  is  to  prove  that 
it  is  always  possible  to  add  n  —  r  rows  to  such  an  r  x  n  Latin  rectangle  to  obtain  a  Latin  square. 

(a)  Suppose  we  are  given  and  r  x  n  Latin  rectangle  L  with  r  <  n.  In  the  notation  for  systems  of 

distinct  representatives,  let  5*  =  n  and  let  Ai  be  those  elements  of  S  that  do  not  appear  in 
the  ith  column  of  L.  Prove  that  a  system  of  distinct  representatives  could  be  appended  to  L  to 
obtain  and  {r        x  n  Latin  rectangle. 

(b)  Prove  that  each  s  G  S  appears  in  exactly  n  —  r  of  the  Ai's. 

(c)  Use  the  previous  result  and  \Ai\  =  n  —  r  to  prove  that  \Aj\  >  \I\  and  so  conclude  that  a  system 
of  distinct  representatives  exists. 

(d)  Use  induction  on  n  —  r  to  prove  that  an  r  x  n  Latin  rectangle  can  be  "completed"  to  a  Latin 
square. 
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6.4.9.  The  purpose  of  this  exercise  is  to  prove  the  Marriage  Theorem  without  using  flows  in  networks.  The 
proof  will  be  by  induction  on  n. 

(a)  Prove  the  theorem  for  n  =  1. 

(b)  For  the  induction  step,  consider  two  cases,  either  (6.12)  is  strict  for  all  /  ^  n  or  it  is  not.  Prove 
the  induction  step  in  the  case  of  strictness  by  proving  that  we  may  choose  any  a  €  An  as  the 
representative  of  An- 

(c)  Suppose  that  equality  holds  in  (6.12)  for  I  n.  Let  X  =  Uig/Aj  and  Bj  =  —  X,  the  set  of 
those  elements  of  Ai  which  axe  not  in  X.  Prove  that 


>  \R\ 


for  all  RC.  {n  —  X).  Use  the  induction  hypothesis  twice  to  obtain  a  system  of  distinct  represen- 
tatives. 

*6.4.10.  Let  G  be  a  directed  graph  and  let  u  and  v  be  two  distinct  vertices  in  G.  Suppose  that  (w,  u)  is  not 
an  edge  of  G.  A  sot  of  directed  paths  from  m  to  u  in  G  is  called  "edge  disjoint"  if  none  of  the  paths 
share  an  edge.  A  set  F  of  edges  of  G  is  called  an  "edge  cutset"  for  u  and  v  if  every  directed  path 
from  u  to  w  in  G  contains  an  edge  in  F.  Prove  that  the  cardinality  of  the  largest  set  of  edge  disjoint 
paths  equals  the  cardinality  of  the  smallest  edge  cutset. 
Hint.  MaJie  a  network  of  G  with  source  u  and  sink  v. 

*6.4.11.  State  and  prove  a  result  like  that  in  Exercise  6.4.10  for  graphs. 

*6.4.12.  Using  the  idea  in  Exercise  6.4.4  state  and  prove  results  like  the  two  previous  exercises  with  "edge" 
replaced  by  "vortex  other  that  u  and  u"  in  the  definitions  of  disjoint  and  cutset.  The  undirected 
result  is  called  Menger's  theorem. 
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Probability  theory  is  used  in  two  different  ways  in  combinatorics. 

It  can  be  used  to  show  that,  if  something  is  large  enough,  then  it  must  have  some  property.  For 
example,  it  was  shown  in  Example  1.25  (p.  29)  that  certain  error  correcting  codes  must  exist  if  the 
code  words  were  long  enough.  Estimates  obtained  this  way  are  often  quite  far  from  best  possible. 
On  the  other  hand,  better  estimates  may  be  hard  to  find. 

Probability  theory  is  also  used  to  study  how  random  objects  behave.  For  example,  what  is  the 
probability  that  a  "random  graph"  with  n  vertices  and  n  —  1  edges  is  a  tree? 

What  do  wo  mean  by  random  graphs?  It  may  be  more  instructive  to  ask  "What  are  some 
questions  asked  about  random  graphs?"  Here  are  some  examples,  sometimes  a  bit  vaguely  stated. 
You  should  think  of  ri  as  large  and  the  graphs  as  simple. 

•  What  is  the  probability  that  a  random  n-vcrtex,  (n  —  l)-cdge  graph  is  a  tree? 

•  How  many  edges  must  an  n- vertex  random  graph  have  so  that  it  is  likely  to  be  connected? 

•  On  average,  how  many  leaves  does  an  n- vertex  tree  possess  and  how  far  is  a  random  tree  likely 
to  differ  from  this  average? 

•  How  many  colors  are  we  likely  to  need  to  color  a  random  n- vertex  graph  that  has  fcn-edges? 

•  How  can  we  generate  graphs  at  random  so  that  they  resemble  the  graph  of  connections  of 

computers  on  the  internet? 

To  answer  questions  like  these,  we  must  be  clear  on  what  is  meant  by  "random"  and  "likely". 
Sometimes  this  can  get  rather  technical;  however,  there  are  simple  examples. 
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To  begin,  we  usually  consider  a  set  Sn  of  (simple)  graphs  with  n-vertices  and  make  it  into  a 
probability  space.  An  obvious  way  to  do  this  is  with  the  uniform  probability.  For  example,  let  Sn 
be  the  set  of  (n  —  l)-edge  simple  graphs  with  vertex  set  n  and  let  Pr  be  the  uniform  distribution. 
What  is  the  probability  that  a  random  such  graph  is  a  tree?  By  Exercise  5.1.2  (p.  124),  there  are  (^) 
fc-edge  graphs,  where  N  =  (2).  Since  a  tree  has  n  —  1  edges,  we  want  A;  =  n  —  1.  By  Example  5.10 
(p.  143),  there  are  n"~^  trees.  Thus  the  answer  is 


Example  6.12  Graphs  with  few  edges  Suppose  a  graph  has  few  edges.  What  are  some  prop- 
erties we  can  expect  to  see? 

To  answer  such  questions,  we  need  a  probability  space.  Let  Q{n,k)  be  the  probability  space 
gotten  by  taking  the  uniform  probability  on  the  set  of  fc-edge  simple  graphs  with  vertex  set  n.  For 

convenience,  let      =  (2) . 

If  our  graph  has  n  —  1  edges,  it  could  be  a  tree;  however,  we'll  show  that  this  is  a  rare  event. 
(You  were  asked  to  estimate  this  probability  in  Exercise  5.5.15(b).  We'll  do  it  here.)  The  number  of 
such  graphs  is 

N  \   _  N{N  -!)■■■  {N  ^  {n-l)  +  l)   ^   (iV  -  (n  -  1))""^ 


n-lj  (n-1)!  (n-1)! 

_  ((n-2)(n-l)/2)"-^ 


> 


(n-1)! 
((n-2)(n-l)/2)"-^ 
V27r(n-l)((n-l)/e)"-^ 

(e(n-2)/2)"-^ 
\/27rn 


by  Stirling's  formula 


Since  there  are  n"  ^  trees,  for  large  n  the  probability  that  a  random  graph  in  ^(n,  n  —  1)  is  a  tree 
is  less  than 


(e(n-2)/2)' 


^  =  y2^(2/e)"-i  (  1  +  ^  )       •  6.14 


(\  n—2 
1  +  ^  e^.  Since 

2/e  <  1,  (6.14)  goes  to  zero  rapidly  as  n  gets  large.  Thus  trees  are  rare. 

You  should  be  able  to  show  that  a  simple  graph  with  n  vertices  and  n  —  1  edges  that  is  not  a 
tree  is  not  connected  and  has  cycles.  That  leads  naturally  to  two  questions: 

•  How  large  must  k  be  so  that  most  graphs  in  Q{n,  k)  have  cycles? 

•  How  large  must  k  be  so  that  most  graphs  in  Q{n,  k)  are  connected? 


The  first  question  will  be  looked  at  some  in  the  exercises.  We'll  look  at  the  second  question  a  bit 
later.  Q 
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Of  course,  the  uniform  distribution  is  not  the  only  possible  distribution,  but  why  would  anyone 
want  to  choose  a  different  one?  Suppose  we  are  studying  graphs  with  n  vertices  and  k  edges.  The  fact 
that  the  number  of  edges  is  fixed  can  be  awkward.  For  example,  suppose  u,  v  and  w  are  three  distinct 
vertices.  If  we  know  whether  or  not  {u,  v}  is  an  edge,  this  information  will  affect  the  probability  that 
{u,  w}  is  an  edge: 

•  If  {u,  v}  is  not  an  edge,  there  must  be  k  edges  among  the  remaining      —  1  possible  edges,  so 
the  probability  that  {u,  w}  is  an  edge  is  equal  to  j^zi- 

•  If  {u,  v}  is  an  edge,  there  must  be  fc  —  1  edges  among  the  remaining      —  1  possible  edges,  so 
the  probability  that  {u,w}  is  an  edge  is  equal  to 

Here  is  a  way  to  avoid  that.  For  each  e  £  V^iji),  make  the  set  {"e  G  G",  "e  ^  G"}  into  a 
probability  space  by  setting  Pr(e  G  G)  =  p  and  Pr(e  ^  G)  =  1  —  p.  We  can  think  of  "e  €  G"  as 
the  event  in  which  e  is  in  a  randomly  chosen  graph.  Let  Qp{'n)  be  the  product  of  these  probability 
spaces  over  all  e  €  Vain).  Let  X{G)  be  the  number  of  edges  in  G.  It  can  be  written  as  a  sum  of  the 
N  independent  random  variables 


By  independence,  E(X)  =  pN  and  var(X)  =  Np{l  —  p)  because  p(l  —  p)  is  the  variance  of  a 
(0,l)-valued  random  variable  whose  probability  of  being  1  is  p.  With  p  =  k/N,  we  expect  to  see  k 
edges  with  variance  kp{l  —  p)  <  k.  By  Chebyshev's  inequality  (C.3)  (p.  385) 


Pri\X  -  k\  >  Ck^^^)  <  l/G^. 
Thus,  with  G  =  fc^/^,  dividing  |X  -  fc|  >  k'^^^  by  k,  and  using  Pt:{A')  =  1  -  Pv{A),  we  have 


Since  {\X  —  k\/k)  x  100  is  the  percentage  deviation  of  the  X  from  k,  (6.16)  tells  us  that  this  deviation 


Because  a  random  graph  in  Gp{n)  has  very  nearly  pN  edges,  results  for  Qp{n)  with  p  =  k/N 
almost  always  hold  for  Q{n,k)  as  well.  Since  Gp{n)  is  usually  easier  to  study  than  Q(n,k),  people 
often  study  it. 

If  we  want  to  consider  all  graphs  with  vertex  set  n  with  each  of  the  2^  graphs  equally  likely,  we 
simply  study  ^1/2(^1)  because  any  particular  graph  with  q  edges  occurs  with  probability 


6.15 


6.16 


is  very  likely  to  be  small  when  k  is  large. ^ 


(1/2)«(1  -  1/2) 


(1/2) 


N 


=  2 


.-N 


a  value  that  is  the  same  for  all  n-vertex  graphs. 


^  For  those  familiar  with  the  normal  approximation  to  the  binomial  distribution,  the  number  of 
edges  is  binomially  distributed.  Using  this,  one  can  avoid  using  Chebyshev's  inequality  and  derive 
stronger  results  than  (6.16). 
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Example  6.13  The  clique  number  of  a  random  graph  A  clique  in  a  graph  G  is  a  subgraph 
H  such  that  every  pair  of  vertices  of  H  is  connected  by  an  edge.  The  size  of  the  chque  is  the  number 
of  vertices  of  H.  The  clique  number  of  a  graph  is  the  size  of  its  largest  clique.  A  fc-vertex  clique  is 
called  a  fc-cliquc. 

What  can  we  say  about  the  clique  number  of  a  random  graph;  that  is,  a  graph  chosen  using 
^1/2 ('^)?  We'll  get  an  upper  bound  on  this  number  for  most  graphs. 

Notice  that  if  a  graph  contains  a  ii'-clique,  then  it  contains  an  fc-clique  for  all  k  <  K.  Thus  if  a 
graph  does  not  contain  a  fc-clique,  its  clique  number  must  be  less  than  k.  If  we  can  show  that  most 
n-vertex  graphs  do  not  have  a  /c-clique,  it  will  follow  that  the  clique  number  of  most  n-vertex  graphs 
is  less  than  k. 

We'll  begin  by  looking  at  the  expected  number  of  fc-cliques  in  an  n-vertex  graph.  When  W  C  n, 
let  Xw  be  1  if  the  vertices  W  form  a  clique  and  0  otherwise.  The  probability  that  Xw  =  1  is 

since  there  are  ('^')  pairs  of  vertices  in  W,  each  of  which  must  be  connected  by  an  edge 
to  form  a  clique.  Edges  that  do  not  connect  two  vertices  in  W  don't  matter — they  can  be  present 
or  absent.  Summing  over  all  fc-element  subsets  of  n,  we  have 


E(number  of  fc-cliques)  =  E  |   ^  Xw  ) 


WCn 
\W\=k 

=      ^(^^^)  =  E  p^{Xw  =  i) 

WCn  WCn 

\W\=k  \W\=k 

=  T.  (1/2)®  ^  f?)2-e). 


WCn 

\W\=k 


Since  the  number  of  fc-cliques  in  a  graph  is  a  nonnegative  integer, 
Pr(at  least  one  A;-clique)  =        Pr(exactly  j  fc-cliques) 


j>o 

<  ^  j  Pr(exactly  j  fc-cliques)  =  E(number  of  fc-cliques) 


3>0 


2-\2)  <  — 2~UJ  =   . 

-   A:!  2-fc/2fc! 

Since  2~''/^k\  is  large  when  k  is  large,  the  probability  of  a  fc-clique  will  be  small  when  k/2>  log2  n. 
Thus  almost  all  graphs  have  clique  number  less  than  2  log2  n. 

What  can  be  said  in  the  other  direction?  A  lower  bound  is  given  in  the  Exercise  6.5.6.  Q 


Example  6.14  Triangles  in  random  graphs  Using  Gpin)  for  our  probability  space,  we  want 
to  look  at  the  number  of  triangles  in  a  random  graph.  For  u,v,w  G  n,  let 

_   f  1    if  the  edges  {u,w},  {w,      are  present, 

u,v,w        ^  Q  otherwise. 

We  claim  Pr(X 

u,v,w  —  1)  —  P^-  How  can  we  see  this?  Intuitively,  each  edge  has  probability  p  of 
being  present  and  they  are  independent,  so  we  get  p^.  More  formally,  since  our  probability  space  is 
a  product  space, 

where  the  X^^y  are  the  random  variable  defined  in  (6.15). 

Thus  the  expected  value  of  Xu,v,w  is  P^-  Since  there  are  (g)  choices  for  {u,  the  expected 

number  of  triangles  in  (3)^^- 
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Let's  compute  the  variance  in  the  number  of  triangles.  We  have  to  be  carefuh  Triangles  are  not 
independent  because  they  may  share  edges.  The  safest  approach  is  to  define  a  random  variable  T 
that  equals  the  number  of  triangles: 

T  =     ^   Xt,    where    X^u,^,^;}  =  Xu^v,w 

Since  var(T)  =  E(r2)  -  E(r)2  and  we  know  E(T)  =  we  need  to  compute  E(T2).  It  equals 

^E(XsXt),  the  sum  ranging  over  all  s,  t  €  'PsCn).  There  are  three  cases  to  consider: 

•  s  =  t       There  are  (3)  terms  like  this  and  ^{X^Xt)  =  E(X2)  =  E(X,)  =  p^. 

•  |s  n  i|  =  2  There  are  (3)  (2)  ("7"^)  terms  like  this  since  we  have  (3)  choices  for  s,  (^)  ways  to 
select  two  elements  of  s  to  include  in  t,  and  ("7^)  ways  to  complete  t.  Since  there  are  a  total  of 
five  edges  in  the  two  triangles,  E{XsXt)  =  p^. 

•  |s  n  t|  <  2  Since  there  are  a  total  of  (3)^  terms  in  the  sum  ^  I]{XgXt)  and  we  have  dealt 
with  (3)  +  (3)3(n  —  3)  terms  already,  there  are 


2 

n  \       / n\  n 


.0,  .o,3(n-3) 

.3;    V3;   V3;  ^  ' 

remaining.  Since  there  are  six  edges  in  the  two  triangles,  'Ei{XgXt)  =  p^ .  Putting  all  this  together 


2 

n  ^ 


3       I  I  1„3 

3 


^^3(n-3y(l-p)+  (^gjp^(l-p^). 

Now  we  want  to  use  Chebyshev's  inequality  to  find  out  when  most  graphs  have  at  least  one 
triangle.  Chebyshev's  inequality  (C.3)  (p.  385)  is 


Pr(|T-E(T)|  >  Vvar(T))  <  l/t\ 


If  T  =  0,  then  |r-E(r)|  =  E(r)  >  E(r)  -  1.  If  we  set  Vvar(r)  =  E(r)  -  1,  Chebyshev's 
inequality  tells  us  the  probability  that  there  is  no  triangle  is  a  number  that  we  want  to  be 

small.  Solving  ty/var{T)  =  E(T)  —  1  and  putting  it  all  together  with  our  previous  calculations: 


Pr(r  =  0)  < 


var(r)      _   Q)3(n-3)/(l-p)  +  C3V(l-p') 


(E(T)-1)2  (©P'-l) 


2 


When  p  is  such  that  p  is  small  and  {'^)p^  is  large,  we  have  {^)p^  ~  n^p^/6  and  so  we  are  assuming 
that  -L  =  np  is  large.  Using  this, 

var(T)       _  nV/2  +  nV/6  ^  ^SLp+1   ^  13^/^2  ^ 


(E(T)  - 1)2  nV/36 

We've  shown  that,  if  np  is  large  and  p  is  small,  then  a  random  graph  in  Qp{n)  almost  certainly 
contains  a  triangle.  If  we  let  p  be  larger,  then  edges  are  more  likely  and  so  triangles  are  more  likely. 
This  we  don't  need  "p  is  small."  In  summary.  If  np  is  large  then  a  random  graph  in  Gp{n)  almost 
certainly  contains  a  triangle.  Q 
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Example  6.15  Growing  random  graphs  Imagine  starting  out  with  vertices  V  =  ri  and  then 
growing  a  simple  graph  by  randomly  adding  edges  one  by  one.  We'll  describe  the  probable  growth 
of  such  a  graph. 

In  the  first  stage,  we  have  a  lot  of  isolated  vertices  (vertices  on  no  edges)  and  pairs  of  vertices 
joined  by  an  edge.  As  time  goes  by  (more  edges  added),  the  single  edges  join  up  forming  lots  of 
small  trees  which  continue  to  grow.  Next  the  trees  start  developing  cycles  and  so  are  no  longer  trees. 
Of  course  there  are  still  a  lot  of  isolated  vertices  and  small  trees  around.  Suddenly  a  threshold  is 
passed  and,  in  the  blink  of  an  eye,  we  have  one  large  (connected)  component  and  lots  of  smaller 
components,  most  of  which  are  trees  and  isolated  vertices.  The  large  component  starts  "swallowing" 
the  smaller  ones,  preferring  to  swallow  the  larger  ones  first.  Finally,  all  that  is  left  outside  the  large 
component  is  a  few  isolated  vertices  which  are  swallowed  one  by  one  and  so  the  graph  is  connected. 
Growth  continues  beyond  this  point,  but  we'll  stop  here. 

When  does  the  graph  get  connected? 

We  can't  answer  this  question;  however,  we  can  easily  compute  the  expected  number  of  isolated 
vertices  in  a  random  graph.  When  this  number  is  near  zero,  we  expect  most  random  graphs  to  be 
connected.  When  it  is  not  near  zero,  we  expect  a  significant  number  of  random  graphs  to  still  contain 
isolated  vertices.  We'll  study  this  with  the  Qp(n)  model.  This  isn't  quite  the  correct  thing  to  do  since 
we're  adding  edges  one  by  one.  However,  it's  harder  to  use  G{n,  k)  and  we're  not  planning  on  proving 
anything — we  just  want  to  get  an  idea  of  what's  true. 

In  Qp{n),  a  vertex  v  will  be  isolated  if  none  of  the  possible  n—1  edges  connecting  it  to  the  rest 
of  the  graph  are  present.  Thus  the  probability  that  v  is  isolated  is  (1  —  p)"'~^.  Since  there  are  n 
vertices,  the  expected  number  of  isolated  vertices  is  n{l  —  p)""^.  When  p  is  small,  1  —  p  «  and 
so  the  expected  number  of  isolated  vertices  is  about  ne~^("~^^  =  e'"""*"^""^'.  This  number  will 
be  near  zero  if  p{n  —  1)  —  Inn  is  a  large  positive  number.  This  is  the  same  as  pn  —  Inn  being  large 
and  positive.  In  this  case,  isolated  vertices  are  unlikely.  In  other  words,  they've  all  probably  been 
swallowed  by  the  big  component  and  the  graph  is  connected.  On  the  other  hand,  if  pn  —  lnn  is  large 
and  negative,  e'""~P("^^)  will  be  large.  In  other  words,  we  expect  a  lot  of  isolated  vertices.  Thus 
p  fa  is  the  critical  point  when  a  graph  becomes  connected.  Since  wc  expect  about  (2)^  »  ninn 
edges,  a  graph  should  become  connected  when  it  has  around  "'^  "  edges.  Q 

So  far  we've  studied  random  graphs  and  asked  what  can  be  expected.  Now  we'll  look  at  a 
different  problem:  we  want  to  guarantee  that  something  must  happen  in  a  graph. 

Example  6.16  Bipartite  subgraphs  A  graph  (y',E')  is  bipartite  if  its  vertices  can  be  parti- 
tioned into  two  sets  V{  and  V2  =  V  —  V{  such  that  the  edges  of  the  graph  only  connect  vertices  in 
V-[  and  1^2'.  In  other  words,  if  {.t,  y}  G  E' ,  then  one  of  x,  y  is  in  V-[  and  the  other  is  in  V^- 

Given  a  graph  G  =  (V,  E)  with  n  vertices  and  k  edges,  we  want  to  find  a  bipartite  subgraph 
with  as  many  edges  as  possible.  How  many  edges  can  we  guarantee  being  able  to  find?  Since  we 
want  as  many  edges  a  possible,  wc  may  as  well  use  all  the  vertices  in  G.  Thus  our  bipartite  graph 
will  be  {V,  E')  where  V  is  partitioned  into  ¥{  and  ^.nd  E'  C  E  are  those  edges  which  connect  a 
vertex  in  Vl  to  a  vertex  in  V2.  Thus,  our  partition  ¥{,      determines  E' . 

Since  the  example  is  in  this  section,  you  can  tell  we're  going  to  use  probability  somehow.  But 
how?  (Our  previous  methods  won't  work  since  we  are  given  a  particular  graph  G.)  We  can  choose 
the  partition  of  V  randomly. 

Our  probability  space  will  be  the  uniform  probability  on  the  set  of  all  subsets  of  V .  A  subset 
will  be  Vl  and  its  complement  V  —  V{  will  be  VJ^  -  For  every  edge  e  =  {x,  y\  €  E^  define  a  random 
variable  ATe  by 


0,  if  both  ends  of  e  are  in  V[  or  in  V^', 

1,  if  one  end  of  e  is  in  V[  and  the  other  in  V^- 
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You  should  be  able  to  see  that  the  number  of  edges  in  the  bipartite  graph  is  X{V{),  which  we  define 

by 

x{v;)  =  J2x,{vi). 

eeE 

We  can  think  of  the  probability  space  as  a  product  space  as  follows.  For  each  v  <E  V,  choose 
it  with  probability  1/2,  independent  of  the  choices  made  on  the  other  vertices.  The  chosen  vertices 
form  V(.  The  probability  of  choosing  Vl  is 

(l/2)l^il(l/2)l^-^il  =  (1/2)1^1, 

where  (l/2)l^i'  is  the  probability  of  choosing  each  of  the  vertices  in  V{  and  (l/2)l^~^i'  is  the  prob- 
ability of  not  choosing  each  of  the  vertices  in  V  —  V{.  Thus,  this  probability  space  gives  equal 
probability  to  every  subset  of  V,  just  like  our  original  space.  Consequently,  wc  can  carry  out  calcu- 
lations in  either  our  original  probability  space  or  the  product  space  we  just  introduced.  Since  the 
product  space  has  a  lot  of  independence  built  into  it,  calculation  here  is  often  easier.  In  particular, 
Pr(Xe  =  1)  =  1/2  for  any  edge  {u.v}  since  X,,  =  1  if  and  only  if  we  make  the  same  decisions  for 
u  and  V  (both  included  or  both  excluded)  and  this  has  probability  (1/2)^  +  (1/2)^  =  1/2.  Thus 
E(Xe)  =  1  X  (l/2)  +  0  X  (1/2)  =  1/2. 

Since  a  random  variable  must  sometimes  be  at  least  as  large  as  it's  average,  there  must  be  a 
V{  C  V  with  X{V{)  >  E(X).  Thus  there  is  a  bipartite  subgraph  with  at  least  E(X)  edges.  Since 
E(Xe)  =  1/2,  E(X)  =  k/2.  Since  the  expected  number  of  edges  in  a  randomly  constructed 
bipartite  subgraph  is  k/2,  at  least  one  of  these  subgraphs  has  at  least  k/2  edges.  In  other  words, 
there  is  a  bipartite  subgraph  containing  at  least  half  the  edges  of  G.  Q 


Exercises 


Remember  the  following! 

•  Expectation  is  linear:  E(Xi  H  \- X^)  =  E(Xi)  H  l-E(Xfc). 

•  Pr(Ai  n  •  •  •  n  Am)  <  Pr{Ai)  H  +  PT{Am),  especially  when  Pr{Ai)  =■■■  =  Pr{Am)- 

6.5.1.  Compute  the  following  for  a  random  graph  in  Qp{ri). 

(a)  The  expected  number  of  vertices  of  degree  d. 

Hint.  Let  Xy  =  1  if  u  has  degree  d  and  Xy  =  0  otherwise.  Study  Xy. 

(b)  The  expected  number  of  4-cycles. 

(c)  The  expected  number  of  induced  4-cycles.  (An  induced  subgraph  of  a  graph  G  is  a  subset  of 
vertices  together  with  all  the  edges  in  G  that  connect  the  vertices.) 

6.5.2.  An  embedding  of  a  simple  graph  H  =  {Vh,  Eh)  into  a  simple  graph  G  =  {Vq,  Eq)  is  an  injection  ip  : 
Vh  Vg  such  that  (fiiE^)  C  Eq,  where  we  define  <p{u,  v}  =  {<p{u) ,  <p{v)} .  If  tfiiE^)  =  EqC\V2{Vh), 
we  call  the  embedding  induced. 

(a)  Prove  that  the  expected  number  of  embeddings  of  J?  in  a  random  graph  in  Qp{n)  when  n  >  \Vh  \  is 

n{n-l)-in-\V,\  +  l)p^-"^  =  J^;^^^- 

(b)  Repeat  (a)  for  induced  embeddings. 

(c)  In  Example  6.14,  we  showed  that  the  expected  number  of  triangles  in  a  random  graph  is  (s)?^- 
If  part  (a)  of  the  present  exercise  is  applied  when  H  is  a,  triangle,  we  obtain  6(3)^^  for  the 
expected  number  of  embeddings.  Explain  the  difference. 
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6.5.3.  In  this  exercise  we'll  find  a  bound  on  k  (as  a  function  of  n)  so  that  most  graphs  in  G{n,  k)  do  not 
have  cycles.  Since  we  know  most  graphs  in  Q(n,n  —  1)  have  cycles,  we'll  assume  k  <  n—1.  For  C  n, 
let  Gc  be  those  graphs  in  which  C  is  a  cycle.  As  usual,  N  =  . 

(a)  Show  that 

Pr(G  has  a  cycle)  <  ^Pr(ac), 

where  the  sum  is  over  all  subsets  of  n  that  contain  at  least  three  elements. 

(b)  By  arranging  the  vertices  in  C  in  a  cycle  and  then  inserting  the  remaining  edges,  prove  that 

Pr(ec)  =   7n\   where  c=|C|. 

Remember  that,  unlike  cycles  in  permutations,  cycles  in  graphs  do  not  have  a  direction. 

(c)  Conclude  that 

Pr(Ghasacycle)  <  E  ft  ^^^^W^" 

c=3  \k) 

(d)  Show  that  the  term  c  in  the  previous  sum  equals 

kl       -pr  n  —  i        k'^  /'  n\'^ 
2c{k-cy.  11  AT-i  ^  2^\n)  ' 

i=0 

6.5.4.  We'll  redo  the  previous  exercise  using  Gpin). 

(a)  Let  vi, . . . ,  Ufe  be  a  list  of  vertices.  Compute  the  probability  that  ui, . . . ,  w/j,  wi  is  a  cycle 

(b)  Show  that  the  probability  that  a  random  graph  in  Gp{n)  contains  a  fe-cycle  is  less  than  n^p'^ . 

(c)  Show  that  the  probability  that  a  random  graph  in  Qp{n)  has  a  cycle  is  less  than  (pn)^  when 
pn  <  2/3. 

6.5.5.  This  exercise  relates  to  Example  6.16. 

(a)  A  simple  graph  G  =  {V,E)  is  complete  if  it  contains  all  possible  edges;  that  is,  E  =  'P2{V). 
Prove  that,  if  \V\  =  2n,  we  can  construct  a  bipartite  subgraph  with      edges.  Obtain  a  similar 

result  [f\V\=2n  +  l. 

(b)  How  close  is  this  result  to  the  lower  bound  in  the  example? 

(c)  Prove  that  when  \V\  and  k  are  large,  there  is  a  graph  G  =  {V,E)  such  that  \E\  =  k  and  the 
relative  error  between  the  lower  bound  and  the  best  bipartite  subgraph  of  G  is  0(l/fc^/^).  (Of 
course  we  need  k  <  [7^2(^)1  or  there  will  be  no  simple  graph.) 

Hint.  Use  a  complete  graph  in  your  construction. 

(d)  Prove  that,  if  G{V,  E)  can  be  properly  colored  using  three  colors,  then  it  has  a  bipartite  subgraph 
with  at  least  2|£|/3  edges. 

6.5.6.  We  want  to  find  a  rmmbcr  k  (depending  on  n)  such  that  most  ri-vcrtcx  graphs  have  a  fe-clique. 
Divide  the  vertices  in  \_n/k\  sets  of  size  k  and  one  (possibly  empty)  smaller  set. 

(a)  Show  that  the  probability  that  none  of  these  [n/fej  sets  is  a  fc-clique  is  (1  —  2~(2))L"/'=J 

(b)  It  can  be  shown  that  1  ~  x  <  for  a;  >  0.  Using  this  and  the  previous  part,  conclude  that, 
for  some  constant  A,  almost  all  n- vertex  graphs  have  a  fe-clique  when  k  <  A(logn)^/^. 
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Finite  State  Machines 


A  "finite  state  machine"  is  simply  a  device  that  can  be  in  any  one  of  a  finite  number  of  situations  and 

is  able  to  move  from  one  situation  to  another.  The  classic  example;  (and  motivation  for  the  subject) 
is  the  digital  computer.  If  no  peripherals  are  attached,  then  the  state  at  any  instant  is  what  is  stored 
in  the  machine.  You  may  object  that  this  fails  to  take  into  account  what  instruction  the  machine  is 
executing.  Not  so;  that  information  is  stored  temporarily  in  parts  of  the  machine's  central  processing 
unit.  We  can  expand  our  view  by  allowing  input  and  output  to  obtain  a  finite  state  machine  with 
I/O. 

By  formalizing  the  concept  of  a  finite  state  machine,  computer  scientists  hope  to  capture  the 
essential  features  of  some  aspects  of  computing.  In  this  section  we'll  study  a  very  restricted  formal- 
ization. These  restricted  devices  are  called  "finite  automata"  or  "finite  state  machines."  The  input 
to  such  machines  is  fed  in  one  symbol  at  a  time  and  cannot  be  reread  by  the  machine. 

Turing  Machines 


A  Turing  machine,  introduced  by  A.M.  Turing  in  1937,  is  a  more  flexible  concept  than  a  flnite 
automaton.  It  is  equipped  with  an  arbitrarily  long  tape  which  it  can  reposition,  read  and  write.  To 
run  the  machine,  we  write  the  input  on  a  blank  tape,  position  the  tape  in  the  machine  and  turn  the 
machine  on.  We  can  think  of  a  Turing  machine  as  computing  a  function:  the  input  is  an  element  of 
the  function's  domain  and  the  output  is  an  element  of  the  function's  range,  namely  the  value  of  the 
fimction  at  that  input.  The  input  and/or  the  output  could  be  nothing.  In  fact,  the  domain  of  the 
function  is  any  finite  string  of  symbols,  where  each  symbol  must  be  from  some  finite  alphabet;  eg. 
{0, 1}.  Of  course,  the  input  might  be  something  the  machine  wasn't  designed  to  handle,  but  it  will 
still  do  something. 

How  complicated  a  Turing  machine  might  we  need  to  build?  Turing  proved  that  there  exists  a 
"universal"  Turing  machine  U  by  showing  how  to  construct  it.  If  Ws  input  tape  contains 

1.  D(T),  a  description  of  any  Turing  machine  T  and 

2.  the  input  /  for  the  Turing  machine  T, 

then  U  will  produce  the  same  output  that  would  have  been  obtained  by  giving  T  the  input  /.  This 
says  that  regardless  of  how  complicated  an  algorithm  we  want  to  program,  there  is  no  need  to  build 
more  than  one  Turing  machine,  namely  the  universal  one  U.  Of  course,  it  might  use  a  lot  of  time  and 
a  lot  of  tape  to  carry  out  the  algorithm,  so  it  might  not  be  practical.  Suprisingly,  it  can  be  shown 
that  U  will,  in  some  sense,  be  almost  as  fast  as  the  the  Turing  machine  that  it  is  mimicking.  This 
makes  it  possible  to  introduce  a  machine  independent  measure  of  the  complexity  of  a  function. 

Although  Turing  machines  seem  simple,  it  is  believed  that  anything  that  can  be  computed  by 
any  possible  computer  can  be  computed  by  a  Turing  machine.  (This  is  called  Church's  Thesis.)  Such 
computable  functions  are  called  recursive  functions.  Are  there  any  functions  which  are  not  recursive? 

Examples  of  nonrecursive  functions  are  not  immediately  obvious.  Here's  one.  "Given  a  Turing 
machine  T  and  input  /,  will  the  Turing  machine  eventually  stop?"  As  phrased  this  isn't  quite  a  fair 
function  since  it's  not  input  for  a  Turing  machine.  We  can  change  it  slightly:  "Given  D(T)  (a  machine 
readable  description  of  T)  and  the  input  /,  will  the  universal  Turing  machine  U  eventually  stop?" 
For  obvious  reasons,  this  is  called  the  halting  problem.  You  may  wonder  why  this  can  be  thought  of 
as  a  function.  The  domain  of  the  function  is  all  possible  pairs  D,  I  where  D  is  any  machine  readable 
description  of  a  machine  and  /  is  any  possible  input  for  a  machine.  The  range  is  {"yes",  "no"}.  The 
machine  U  computes  the  value  of  the  function. 

Theorem  6.10    The  halting  problem  is  nonrecursive. 
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Proof:  We  can't  give  a  rigorous  proof  here;  in  fact,  we  haven't  even  defined  our  terms  precisely. 
Nevertheless,  we  can  give  the  idea  for  a  proof.  Suppose  there  existed  a  Turing  machine  Ti  that  could 
solve  the  halting  problem.  Create  a  Turing  machine  B  that  contains  within  itself  what  is  essentially 
a  subroutine  equivalent  to  Ti  and  acts  as  follows.  Whatever  is  written  on  the  input  tape,  it  makes  a 
copy  of  it.  It  then  "calls"  H  to  process  as  input  the  original  input  together  with  the  copy.  If  H  says 
that  the  answer  is  "Doesn't  stop,"  then  B  stops;  otherwise  B  enters  an  infinite  loop. 

How  docs  this  rather  strange  machine  behave?  Suppose  B  is  given  D(T)  as  input.  It  then  passes 
to  Ti,  the  input  D(T)  D(T)  and  so  Ji  solves  the  halting  problem  for  T  with  D(T)  as  input.  In  other 
words: 

If  B  is  given  D(T),  it  halts  if  and  only  if  T  with  input  D(T)  would  run  for  ever.  6.17 

What  does  B  do  with  the  input  D(S)?  We  simply  use  (6.17)  with  T  equal  to  B.  Thus,  B  with 

input  T){B)  halts  if  and  only  if  B  with  input  D(B)  runs  for  ever.  Since  this  is  self-contradictory, 
there  is  either  an  error  in  the  proof,  an  inconsistency  in  mathematics,  or  a  mistake  in  assuming  the 
existence  of  H.  We  believe  that  the  last  is  the  case:  There  cannot  be  a  Turing  machine  Ti  to  solve 
the  halting  problem.  Q 

We  can  describe  the  previous  proof  heuristically.  If  W  exists,  then  it  predicts  the  behavior  of 
any  Turing  machine.  B  with  input  D(B)  is  designed  to  ask  Ti,  how  it  will  behave  and  then  do  the 
opposite  of  the  prediction. 

Finite  State  Machines  and  Digraphs 


Consider  a  finite  state  machine  that  receives  input  one  symbol  at  a  time  and  enters  a  new  state 

based  on  that  symbol.  We  can  represent  the  states  of  the  machine  by  vertices  in  a  digraph  and  the 
effect  of  the  input  i  in  state  s  by  a  directed  edge  that  connects  s  to  the  new  state  and  contains  i 
and  the  associated  output  in  its  name.  The  following  example  should  clarify  this. 

Example  6.17  Binary  addition  We  would  like  to  add  together  two  nonnegative  binary  num- 
bers and  output  the  sum.  The  input  is  given  as  pairs  of  digits,  one  from  each  number,  starting  at 

the  right  ends  (units  digits)  of  the  input.  The  pair  22  marks  the  end  of  the  input.  Thus  to  add  010 
and  110  you  would  input  the  four  pairs  00,  11,  01  and  22  in  that  order.  In  other  words, 

^n-^n—l'  '  '-^1 

the  sum  problem  B„Bn-i     Bi    ^g^^j^gg    AiBi, . . . ,  An-iBn-i,  AnBn,22. 

Cn+lCnCn—l'  '  'Cl 

The  output  is  given  as  single  digits  with  2  marking  the  end  of  the  output,  so  the  output  for  our 
example  would  be  00012.  (The  sum  is  backwards,  C\  . . .  C„_i,  C„,  C„+i,  2,  because  the  first  output 
is  the  units  digit.)  We  have  two  internal  states:  carry  (C)  and  no  carry  (N)  You  should  verify  that 
the  adder  can  be  described  by  the  table  in  Figure  6.11.  The  entry  (o,  S2)  in  position  (.si,  i)  says  that 
if  the  machine  is  in  state  si  and  receives  input  i,  then  it  will  produce  output  o  and  move  to  state 
S2-  It  is  called  the  state  transition  table  for  the  machine.  Note  that  being  in  state  C  (carry)  and 
receiving  22  as  input  causes  two  digits  to  be  output,  the  carry  digit  and  the  termination  digit  2. 

We  can  associate  a  digraph  (V,  (^)  with  the  tabular  description,  where  V  =  {N,  C},  each 
edge  is  a  4-tuple  e  =  (si,i,o,  S2),  <^(e)  =  (si,S2)  and  i  and  o  are  the  associated  input  and  output, 
respectively.  In  drawing  the  picture,  a  shorthand  is  used:  the  label  00,  22  :  1, 12  on  the  edge  from  C 
to  N  in  Figure  6.11  stands  for  the  two  edges  (C,  00,  1,  N)  and  (C,  22,  12,  N). 

This  example  is  slightly  deficient.  We  tacitly  assumed  that  everyone  (and  the  machine!)  somehow 
knew  that  the  machine  should  start  in  state  N.  We  should  really  indicate  this  by  labeling  N  as  the 
starting  state. 
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00      01        10        11  22 


N  0,  N  1,  N  1,  N  0,  C  2,  N 
C  1,  N     0,  C     0,  C     1,  C     12,  N 


11  :  0  start 


01,10,11  : 
0,0,1 


00,01,10,22  : 
0,1,1,2 


00,  22  :  0, 12 


Figure  6.11  Tabular  and  graphical  descriptions  of  a  finite  state  machine  for  adding  two  binary  numbers. 
The  starting  and  accepting  states  are  both  N. 


You  can  use  the  associated  digraph  to  see  easily  what  an  automaton  does  with  any  given  input 
string.  Place  your  finger  on  the  starting  state  and  begin  reading  the  input.  Each  time  you  read  an 
input  symbol,  follow  the  directed  edge  that  has  that  input  symbol  to  whichever  state  (vertex)  it 
leads  and  write  down  the  output  that  appears  on  the  edge.  Keep  this  process  up  until  you  have  used 
up  all  the  input.  Q 

Suppose  we  want  a  machine  that  simply  recognizes  a  situation  but  takes  no  action.  We  could 
phrase  this  by  saying  that  the  machine  either  accepts  (recognizes)  or  rejects  an  input  string.  In  this 
case,  we  need  not  have  any  output;  rather,  we  can  label  certain  states  of  the  machine  as  "accepting." 
This  is  represented  pictorially  by  a  double  circle  around  an  accepting  state.  If  the  machine  ends  up 
in  an  accepting  state,  then  the  input  is  accepted;  otherwise,  it  is  rejected. 

Example  6.18  No  adjacent  ones  Let's  construct  a  machine  to  recognize  (accept)  all  strings 
of  zeroes  and  ones  that  contain  no  adjacent  ones.  The  idea  is  simple:  keep  track  of  what  you  saw 
last  and  if  you  find  a  one  followed  by  a  one,  reject  the  string.  We  use  three  states,  labeled  0,  1  and 
R,  where  0  and  1  indicate  the  digit  just  seen  and  R  is  the  reject  state.  Both  0  and  1  are  accepting 
states.  What  is  the  start  state?  We  could  add  an  extra  state  for  this,  but  we  get  a  smaller  machine 
if  we  let  0  be  the  start  state.  This  can  be  done  because  the  string  si  •  •  •  s„  is  acceptable  if  and  only 
if  Osi  •  •  •  Sn  is.  You  should  be  able  to  draw  a  diagram  of  this  machine.  D 

Example  6.19  Divisibility  by  five  Let's  construct  a  machine  to  recognize  (accept)  numbers 
which  are  divisible  by  5.  These  numbers  will  be  presentcxi  from  left  to  right  as  binary  numbers;  i.e., 
starting  with  the  highest  order  digits.  To  construct  such  a  machine,  we  simply  design  it  to  carry 
out  the  usual  long  division  algorithm  that  you  learned  many  years  ago.  At  each  step  in  usual  long 
division  algorithm  you  produce  a  remainder  to  which  you  then  append  the  next  digit  of  the  dividend. 
In  effect,  this  multiplies  the  remainder  by  ten  and  adds  the  next  digit  to  it.  Since  we  are  working 
in  binary,  we  follow  the  same  process  but  multiply  by  two  instead  of  by  ten.  The  digraph  for  the 
machine  appears  in  Figure  6.12.  No  output  is  shown  because  there  is  none.  D 

Here's  a  formal  definition  of  the  concept  of  a  finite  automaton,  which  we've  been  using  rather 
loosely  so  far. 

Definition  6.9    Finite  automaton    A  finite  automaton  is  a  quadruple  (5, /, /,  sq)  where 

S  and  I  arc  Unite  sets,  sq  £  S  and  f:SxI^S.S  is  called  the  set  of  states  of  the  automaton,  I 
the  set  of  input  symbols  and  sq  the  starting  state.  If  the  automaton  has  accepting  states, 
we  append  them  to  the  quadruple  to  give  a  quintuple.  If  the  automaton  has  a  set  output 
symbols  O,  then  we  append  it  to  the  tuple  and  change  the  definition  of  f  to  f:  S  x  I  ^  O  x  S . 
If  an  input  string  leaves  the  automaton  in  an  accepting  state,  we  say  that  the  automaton  accepts 
the  string  or  that  it  recognizes  the  string. 
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0 


1 


1 


Figure  6.12  A  machine  to  test  divisibihty  of  binary  numbers  by  5.  The  starting  and  accepting  states  are 
both  0.  Input  begins  with  the  high  order  (leftmost)  bit. 


Example  6.20    An  automaton  grammar    We  can  represent  a  finite  automaton  without  output 

in  another  fashion.  For  each  edge  (si,i,S2)  we  write  si  i,S2-  If  S2  is  an  accepting  state,  we  also 
write  Si  — >  i.  Suppose  we  begin  with  the  starting  state,  say  sq  and  replace  it  with  the  right  side  of 
some  So  — If  this  leads  to  a  string  contains  a  state  s,  then  replace  s  in  the  string  with  the  right  side 
of  some  s  After  n  such  steps  we  will  end  up  with  either  a  string  of  n  input  symbols  or  a  string 
of  n  input  symbols  followed  by  a  state. 

We  claim  that  any  string  of  input  symbols  (with  no  appended  state)  that  can  be  produced  in 
this  fashion  is  accepted  and  conversely.  Why  is  this  true?  Our  replacement  process  mimics  travelling 
along  the  digraph  as  dictated  by  the  input  string.  We  can  only  omit  the  state  symbol  at  the  end  by 
moving  to  an  accepting  state.  This  is  an  example  of  a  "grammar."  We'll  say  more  about  grammars 
in  Section  9.2.  □ 

Wc  can  attempt  to  endow  our  string  recognizer  with  "free  will"  by  allowing  random  choices.  At 
present,  for  each  state  s  G  S*  and  each  i  E  I  there  is  precisely  one  t  G  S*  such  that  (s,  i,  t)  is  an  edge 
of  the  digraph.  Remove  the  phrase  "precisely  one."  We  can  express  this  in  terms  of  the  function  / 
by  saying  that  /:  5  x  /  ^  2^^,  the  subsets  of  S,  instead  oi  f:  S  x  I  ^  S.  Here  f{s,  i)  is  the  set  of  all 
states  t  such  that  (s,  z,  t)  is  an  edge. 

Since  the  successor  of  a  state  is  no  longer  uniquely  defined,  what  happens  when  the  machine  is 
in  state  s  and  receives  input  ?'?  If  f{s,i)  —  0,  the  empty  set,  the  machine  stops  and  does  not  accept 
the  string;  otherwise,  the  machine  selects  by  its  "free  will"  any  t  G  /(s,  i)  as  its  next  state. 

Since  the  outcome  of  a  given  input  string  is  not  uniquely  determined,  how  do  we  define  accep- 
tance? We  simply  require  that  acceptance  be  possible:  If  it  is  possible  for  the  machine  to  get  from  its 
starting  state  to  an  accepting  state  by  any  sequence  of  choices  when  it  receives  the  input  string  X, 
then  we  say  the  machine  accepts  X.  Such  a  machine  is  called  a  nondeterministic  finite  automaton. 
What  we  have  called  finite  automata  are  often  called  deterministic  finite  automata. 

Theorem  6.11  No  "freewill"  Given  a  nondeterministic  finite  automaton,  there  exists  a 
(deterministic)  finite  automaton  that  accepts  exactly  the  same  input  strings. 

The  conclusions  of  this  theorem  are  not  as  sweeping  as  our  name  for  it  suggests:  We  are  speaking 
about  a  rather  restricted  class  of  devices  and  are  only  concerned  about  acceptance.  The  rest  of  this 
section  will  be  devoted  to  the  proof. 
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Proof:  Let  TV  =  {S,  I,  f,  sq:  with  f:SxI^  2^ ,  be  a  nondeterministic  finite  automaton.  We  will 
construct  a  deterministic  finite  automaton  V  =  {T,  I,  g,to,  B)  that  accepts  the  same  input  strings 
as  Af. 

Let  the  states  T  of  2?  be  2'^,  the  set  of  all  subsets  of  S.  The  initial  state  of  V  will  be  to  =  {sq} 
and  the  set  of  accepting  states  B  oiV  will  be  the  set  of  all  those  subsets  of  S  that  contain  at  least 
one  element  from  A.  We  now  define  g{t,i).  Let  g{^,i)  =  0.  For  t  a  nonempty  subset  of  S,  let  g{t,i) 
be  the  union  of  f{s,  i)  over  all  s  &t;  that  is, 

g{t,i)  =  \Jfis,i).  6.18 

set 

This  completes  the  definition  of  V. 

We  must  prove  that  V  recognizes  precisely  the  same  strings  that  TV  does.  Let  tn  be  the  state 
that  V  is  in  after  receiving  the  input  string  X  =  ii . .  .in-  We  claim  that 

TV  can  be  in  a  state  s  after  receiving  X  if  and  only  if  s  e  tn-  6.19 

Before  proving  (6.19),  we'll  use  it  to  prove  the  theorem. 

Suppose  that  TV  accepts  X.  Then  it  is  possible  for  TV  to  reach  some  accepting  state  a  &  A,  which 
is  in  tn  by  (6.19).  By  the  definition  of  B,  tn  G  B.  Thus  V  accepts  X. 

Now  suppose  that  V  accepts  X-  Since  tn  is  an  accepting  state  of  V,  it  follows  from  the  definition 
of  B  that  some  a  G  A  is  in  t„.  By  (6.19),  TV  can  reach  a  when  it  receives  X.  We  have  shown  that 
(6.19)  implies  the  theorem. 

It  remains  to  prove  (6.19).  We'll  use  induction  on  n.  Suppose  that  n  =  1.  Since  to  =  {sq},  it 
follows  from  (6.18)  that  ti  =  f{to,ii)-  Since  this  is  the  set  of  states  that  can  be  reached  by  TV  with 
input  ii,  we  are  done  for  n  =  1. 

Suppose  that  n  >  1.  By  (6.18), 

in    =     U  f{s,in)- 

setn-1 

By  the  induction  assumption,  tn-i  is  the  set  of  states  s  that  TV  can  be  in  after  receiving  the  input 

ii . . .  in-i-  By  the  definition  of  /,  f{s,  in)  is  the  set  of  states  that  TV  can  reach  from  s  with  input  i„. 
Thus  tn  is  the  set  of  states  that  TV  can  be  in  after  receiving  the  input  h  . . .  i„.  Q 


Exercises 


6.6.1.  What  is  the  state  transition  table  for  the  automaton  of  Example  6.19? 

6.6.2.  We  now  wish  to  check  for  divisibility  by  3. 

(a)  Give  a  digraph  like  that  in  Example  6.19  for  binary  numbers. 

(b)  Give  the  state  transition  table  for  the  previous  digraph. 

(c)  Repeat  the  two  previous  parts  when  the  input  is  in  base  10  instead  of  base  2. 

*(d)    Construct  an  automaton  that  accepts  decimal  input,  starting  with  the  unit's  digit  and  checks 
for  divisibility  by  3. 

*(e)  Construct  an  automaton  that  accepts  binary  input,  starting  with  the  unit's  digit  and  checks  for 
divisibility  by  3. 

6.6.3.  Design  a  finite  automaton  to  recognize  all  strings  of  zeroes  and  ones  in  which  no  maximal  string  of 
ones  has  even  length.  (A  maximal  string  of  ones  is  a  string  of  adjacent  ones  which  cannot  be  extended 
by  including  an  adjacent  element  of  the  string.) 
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6.6.4.  An  automaton  is  given  by  {{b,  s,  d,  z},  {+,  — ,  0, 1, . . . ,  9},  /  ,  fe,  {d}),  where  f{b,  +)  =  s,  f{b,  — )  =  s, 
f{t,  k)  =  dioT  t  =  b,  s,  d  and  0  <  fe  <  9,  and  f{t,  i)  =  z  for  all  other  {t,  i)  £  S  x  I. 

(a)  Draw  the  digraph  for  the  machine. 

(b)  Describe  the  strings  recognized  by  the  machine. 

(c)  Describe  the  strings  recognized  by  the  machine  if  both  s  and  d  are  acceptance  states. 

6.6.5.  A  "floating  point  number"  consists  of  two  parts.  The  first  part  consists  of  an  optional  sign  followed 
by  a  nonempty  string  of  digits  in  which  at  most  one  decimal  point  may  be  present.  The  second  part  is 
either  absent  or  consists  of  "E"  followed  by  a  signed  integer.  Draw  the  digraph  of  a  finite  automaton 
to  recognize  floating  point  numbers. 

6.6.6.  A  symbol  a  string  ij . . .  i„  is  "isolated"  if  (i)  either  fc  =  1  or  ^  *fe-ij  and  (ii)  either  A;  =  n  or 
ifc  ^  ik+i-  For  example,  0111010011  contains  two  isolated  zeroes  and  one  isolated  one. 

(a)  Draw  a  digraph  for  a  finite  automaton  that  accepts  just  those  strings  of  zeroes  and  ones  that 
contain  at  least  one  isolated  one. 

(b)  Now  draw  a  machine  that  accepts  strings  with  precisely  one  isolated  one. 

6.6.7.  In  this  exercise  you  are  to  construct  an  automaton  that  behaves  like  a  vending  machine.  To  keep 
things  simple,  there  are  only  three  items  costing  15,  20  and  25  cents  and  indicated  by  the  inputs 
A,  B  and  C,  respectively.  Other  allowed  inputs  are  5,  10  and  25  cents  and  R  (return  coins).  The 
machine  will  accept  any  amount  of  money  up  to  30  cents.  Additional  coins  will  be  rejected.  When 
input  A,  B  or  C  is  received  and  sufficient  money  is  present,  the  selection  is  delivered  and  change  (if 
any)  returned.  If  insufficient  money  is  present,  no  action  is  taken. 

(a)  Describe  appropriate  states  and  output  symbols.  Identify  the  starting  state. 

(b)  Give  the  state  transition  table  for  your  automaton. 

(c)  Draw  a  digraph  for  your  automaton. 

6.6.8.  Suppose  that  A4  =  {S,  I,  f,  so,  A)  and  M'  =  (5',  I,  /',  s'q,  A')  are  two  automata  with  the  same  input 
symbols  and  with  acceptance  states  A  and  A'  respectively. 

(a)  Describe  in  terms  of  sets  and  functions  an  automaton  that  accepts  only  those  strings  acceptable 
to  both  M  and  M' . 

Hint.  The  states  can  be  S  x  S*'. 

(b)  When  is  there  an  edge  from  (s,  s')  to  {t,  t')  in  your  new  automaton  and  what  input  is  it  associated 

with? 

(c)  Use  this  idea  to  describe  an  automaton  the  recognizes  binary  numbers  which  axe  divisible  by  15 
in  terms  of  those  in  Example  6.19  and  Exercise  6.6.2. 

(d)  Design  a  finite  automaton  that  recognizes  those  binary  numbers  which  are  either  divisible  by  3 
or  divisible  by  5  or  both. 
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Notes  and  References 


Spanning  trees  are  one  of  the  most  important  concepts  in  graph  theory.  As  a  result,  they  are 
discussed  in  practically  every  text.  Our  point  of  view  is  like  that  taken  by  Tarjan  [7;  Ch.6],  who 
treats  the  subject  in  greater  depth. 

An  extensive  treatment  of  planarity  algorithms  can  be  found  in  Chapters  6  and  7  of  the  text  by 
Williamson  [8] .  Wilson's  book  [98]  is  a  readable  accoimt  of  the  history  of  the  four  color  problem. 

Since  flows  in  networks  is  an  extremely  important  subject,  it  can  be  found  in  many  texts. 
Papadimitriou  and  Steiglitz  [6]  and  Tarjan  [7]  treat  the  subject  extensively.  In  addition,  network 
flows  are  related  to  linear  programming,  so  you  will  often  find  network  flows  discussed  in  linear 
programming  texts  such  as  the  book  by  Hu  [4] . 

The  marriage  theorem  is  a  combinatorial  result  about  sets.  Sperner's  theorem,  which  we  studied 
in  Example  1.22,  is  another  such  result.  For  more  on  this  subject,  see  the  text  by  Anderson  [1]. 
Related  in  name  but  not  in  results  is  the  Stable  Marriage  Problem.  In  this  case,  we  have  two  sets 
of  the  same  size,  say  men  and  women.  Each  woman  ranks  all  of  the  men  and  each  man  all  of  the 
women.  The  men  and  women  are  married  to  each  other.  The  situation  is  considered  stable  if  we 
cannot  find  a  man  and  a  woman  who  both  rank  each  other  higher  than  their  mates.  It  can  be  proved 
that  a  stable  marriage  always  exists.  Gusfield  and  Irving  [3]  discuss  this  problem  and  its  applications 
and  generalizations. 

The  books  on  the  probabilistic  method  are  more  advanced  than  the  discussion  here.  Perhaps 
the  gentlest  book  is  the  one  by  Molloy  and  Reed  [5]. 

Automata  arc  discussed  in  some  combinatorics  texts  that  are  oriented  toward  computer  science. 
There  arc  also  textbooks,  such  as  Drobot  [2]  devoted  to  the  subject. 
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MIT  Press  (1989). 

4.  T.C.  Hu,  Integer  Programming  and  Network  Flows,  Addison- Wesley  (1969). 

5.  Michael  Molloy  and  Bruce  Reed,  Graph  Coloring  and  the  Probabilistic  Method,  Springer- Verlag 
(2002). 

6.  Christos  H.  Papadimitriou  and  Kenneth  Steiglitz,  Combinatorial  Optimization:  Algorithms  and 

Complexity,  Dover  (1998). 

7.  Robert  E.  Tarjan,  Data  Structures  and  Algorithms,  SIAM  (1983). 

8.  S.  Gill  Williamson,  Combinatorics  for  Computer  Science,  Dover  (2002). 

9.  Robin  Wilson,  Four  Colours  Suffice:  How  the  Map  Problem  was  Solved,  Penguin  Press  (2002). 


PART  III 

Recursion 


Recursive  thinking  plays  a  fundamental  role  in  combinatorics,  theoretical  computer  science  and 
programming.  The  next  chapter  introduces  the  recursive  approach  and  the  following  two  chapters 
discuss  important  applications. 

Definition    Recursive  approach    A  recursive  approach  to  a  problem  consists  of  two  parts: 

•  The  problem  is  reduced  to  one  or  more  problems  of  the  same  sort  which  are  simpler  in  some 
sense. 

•  There  is  a  collection  of  simplest  problems  to  which  all  others  are  reduced  after  one  or  more 
steps.  A  solution  to  these  simplest  problems  is  given. 

As  you  might  expect,  a  recursive  algorithm  is  one  that  refers  to  itself.  This  seemingly  simple 
notion  is  extremely  important.  Recursive  algorithms  and  their  translation  into  recursive  procedures 

and  recursive  data  structures  arc  fundamental  in  computer  science.  For  example,  here's  a  recursive 
algorithm  for  sorting  a  list.  (Sorting  a  list  means  putting  the  items  in  increasing  order.) 

•  divide  the  list  roughly  in  half, 

•  sort  each  half,  and 

•  "merge"  the  two  sorted  halves. 

Proof  by  induction  and  recursive  algorithms  are  closely  related.  We'll  begin  Chapter  7  by  exam- 
ining inductive  proofs  and  recursive  equations.  Then  we'll  look  briefly  at  thinking  and  programming 
recursively. 

Suppose  that  we  have  some  items  that  have  a  "natural"  order;  e.g.,  the  natural  order  for  student 
records  might  be 

•  alphabetic  by  name  (last  name  first), 

•  first  by  class  and,  within  a  class,  alphabetic  by  name,  or 

•  by  grade  point  average  with  highest  first. 

We  may  allow  ties.  The  problem  of  sorting  is  as  follows:  Given  a  list  of  items  in  no  particular  order, 
rearrange  it  so  that  it  is  in  its  natural  order.  In  the  event  of  a  tie,  the  relative  order  of  the  tied  items 
is  arbitrary.  In  Chapter  8,  we'll  study  some  of  the  recursive  aspects  of  software  and  hardware  sorting 
algorithms.  Many  of  these  use  the  "divide  and  conquer"  technique,  which  often  appears  in  recursive 
algorithms.  We  end  the  chapter  with  a  discussion  of  this  important  technique. 

One  of  the  most  important  conceptual  tools  in  computer  science  is  the  idea  of  a  rooted  plane 
tree,  which  we  introduced  in  Section  5.4.  This  leads  naturally  to  methods  for  ranking  and  unranking 
various  classes  of  unlabeled  RP-trees.  Many  combinatorial  algorithms  involve  "traversing"  RP-trees. 
Grammars  can  often  be  thought  of  in  terms  of  RP-trees,  and  generating  machine  code  from  a  higher 
level  language  is  related  to  the  traversal  of  such  trees.  These  topics  are  discussed  in  Chapter  9. 
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CHAPTER  7 


Induction 
and 
Recursion 


Introduction 


Suppose  A{n)  is  an  assertion  that  depends  on  n.  We  use  induction  to  prove  that  A{n)  is  true  when 
we  show  that 

•  it's  true  for  the  smallest  value  of  n  and 

•  if  it's  true  for  everything  less  than  n,  then  it's  true  for  n. 

Closely  related  to  proof  by  induction  is  the  notion  of  a  recursion.  A  recursion  describes  how  to 
calculate  a  value  from  previously  calculated  values.  For  example,  n!  can  be  calculated  by 


We  discussed  recursions  briefly  in  Section  1.4. 

Notice  the  similarity  between  the  two  ideas:  There  is  something  to  get  us  started  and  then  each 
new  thing  depends  on  similar  previous  things.  Because  of  this  similarity,  recursions  often  appear  in 
inductively  proved  theorems  as  either  the  theorem  itself  or  a  step  in  the  proof.  We'll  study  inductive 
proofs  and  recursive  equations  in  the  next  section. 

Inductive  proofs  and  recursive  equations  arc  special  cases  of  the  general  concept  of  a  rccm-sive 
approach  to  a  problem.  Thinking  recursively  is  often  fairly  easy  when  one  has  mastered  it.  Unfortu- 
nately, people  are  sometimes  defeated  before  reaching  this  level.  We've  devoted  Section  2  to  helping 
you  avoid  some  of  the  pitfalls  of  recursive  thinking. 

In  Section  3  wo  look  at  some  concepts  related  to  recursive  algorithms  including  proving  correct- 
ness, recursions  for  running  time,  local  descriptions  and  computer  implementation. 

Not  only  can  recursive  methods  provide  more  natural  solutions  to  problems,  they  can  also  lead 
to  faster  algorithms.  This  approach,  which  is  often  referred  to  as  "divide  and  conquer,"  is  discussed 
in  Section  4.  The  best  sorting  algorithms  are  of  the  divide  and  conquer  type,  so  we'll  see  a  bit  more 
of  this  in  Chapter  8. 
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7.1    Inductive  Proofs  and  Recursive  Equations 


The  concept  of  proof  by  induction  is  discussed  in  Appendix  A  (p.  361).  We  strongly  recommend 
that  you  review  it  at  this  time.  In  this  section,  we'll  quickly  refresh  your  memory  and  give  some 
examples  of  combinatorial  applications  of  induction.  Other  examples  can  be  found  among  the  proofs 
in  previous  chapters.  (See  the  index  under  "induction"  for  a  listing  of  the  pages.) 

We  recall  the  theorem  on  induction  and  some  related  definitions: 

Theorem  7.1  Induction  Let  A{m)  be  an  assertion,  the  nature  of  which  is  dependent  on 
the  integer  m.  Suppose  that  we  have  proved  A{n)  for  uq  <  n  <  ni  and  the  statement 

"If  n  >  ni  and  A{k)  is  true  for  all  k  such  that  no  <  k  <  n,  then  A{n)  is  true." 

Then  A{m)  is  true  for  all  m>  no- 

Definition  7.1  The  statement  'A{k)  is  true  for  all  k  such  that  no  <  k  <  n"  is  called  the 
induction  assumption  or  induction  hypothesis  and  proving  that  this  implies  A{n)  is  called 
the  inductive  step.  The  cases  no  <  n  <ni  are  called  the  base  cases. 

Proof:  We  now  prove  the  theorem.  Suppose  that  A{n)  is  false  for  some  n>  no-  Let  m  be  the  least 
such  n.  We  cannot  have  m  <  no  because  one  of  our  hypotheses  is  that  A{n)  has  been  proved  for 
no  <  n  <  ni-  On  the  other  hand,  since  m  is  as  small  as  possible,  A{k)  is  true  for  no  <  k  <  m.  By 
the  inductive  step,  A{m)  is  also  true,  a  contradiction.  Hence  our  assumption  that  A{n)  is  false  for 
some  n  is  itself  false;  in  other  words,  A{n)  is  never  false.  Q 

Exannple  7.1   The  parity  of  binary  trees   The  numbers  b„,  n  >  1,  given  by 

6i  =  1    and    b„  =  bib„-i  +  b2bn-i  H  h  6„-i6i  forn  >  1  7.1 

count  the  number  of  "unlabeled  full  binary  RP-trecs."  We  prove  this  recursion  in  Example  7.10  and 
study  these  trees  more  in  Section  9.3  (p.  259).  For  now,  all  that  matters  is  (7.1),  not  what  the  6„ 
count. 

Using  the  definitions,  we  compute  the  first  few  values: 

6i  =  1    62  =  1    63  =  2    64  =  5    65  =  14    66  =  42    67  =  132. 

Most  values  appear  to  be  even.  If  you  compute  b%,  you  will  discover  that  it  is  odd.  Since  61,  62,  64 
and  bs,  are  the  only  odd  values  with  n  <  8,  we  conjecture  that  6„  is  odd  if  and  only  if  n  is  a  power 
of  2.  Call  the  conjecture  A{n).  How  should  we  choose  no  and  n\l  Since  the  recursion  in  (7.1)  is  only 
valid  for  n  >  1,  the  case  n  =  1  appears  special.  Thus  we  try  letting  no  =  n\  =  1  and  using  the 

recursion  for  n  >  1. 

Since  61  =  1  is  odd,  ^(1)  is  true.  We  now  consider  n  >  1.  If  n  is  odd,  let  k  =  (n  —  l)/2  and  note 
that  we  can  write  the  recursion  as 

bn  =  2(6i6„_i  +  62671-2  H  h  bkbk+i)- 

Hence  6„  is  even  and  no  induction  was  needed.  Now  suppose  n  is  even  and  let  k 
recursion  becomes 

bn  =  2(6i6„_i  +  62671-2  H  h  6fe_i6fc+i)  +  6^. 

Hence  6„  is  odd  if  and  only  if  bk  =  6„/2  is  odd.  By  the  induction  assumption,  6„/2  is  odd  if  and  only 
if  n/2  is  a  power  of  2.  Since  n/2  is  a  power  of  2  if  and  only  if  n  is  a  power  of  2,  we  are  done.  Q 


=  n/2.  Now  our 
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Example  7.2    The  Fibonacci  numbers    One  definition  of  the  Fibonacci  numbers  is 

Fo  =  0,    Fi  =  1,    and    F„+i  =  F„  +  F^-i  for  n  >  0.  7.2 
We  want  to  prove  that 

Let  that  be  A{n).  Since  (7.2)  is  our  only  information,  we'U  use  it  to  prove  (7.3).  We  must  either 
think  of  our  induction  in  terms  of  proving  A{n  +  1)  or  rewrite  the  recursion  as  F„  =  F„_i  +  F„_2. 
We'll  use  the  latter  approach  Since  the  recursion  starts  at  n  +  1  =  2,  we'll  have  to  prove  A{0)  and 
^(1)  separately.  Hence  no  =  0  and  ni  =  1  in  Theorem  7.1.  Since 


2      1         -  - 


^(0)  is  true.  Since 


V5[     2     )       ^[     2  ) 


=  1, 


^(1)  is  true. 

Now  for  the  induction.  Wc  want  to  prove  (7.3)  for  n  >  1.  By  the  recursion,  F„  =  Fn-i  +  Fn-2- 
Now  use  A{n  —  1)  and  A{n  —  2)  to  replace  F„_i  and  Fn-2-  Thus 

and  so  we  want  to  prove  that 


2 


J_  A  +  75 Y"  1  / 1  -  "  ^  J_  /l  +  ^/5\"  \j_  / 1  -  75\  "  ' 
V5l^2j       751^2^-^51^2^  V5l^2j 


^  V5  \^     2     y         V5  \^ 


2 


J       and  multiply  by  -\/5  to  see  that 

they  combine  correctly  if 


which  is  true  by  simple  algebra.  The  three  terms  with  1  —  \/b  are  handled  similarly.  Q 
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Example  7.3  Disjunctive  form  for  Boolean  functions  We  will  consider  functions  with 
domain  {0,1}"  and  range  {0,1}.  A  typical  function  is  written  /(xi, . . . ,  a;„).  These  functions  are 
called  Boolean  functions  on  n  variables.  With  0  interpreted  as  "false"  and  1  as  "true,"  we  can  think 
of  xi,...,Xn  as  statements  which  are  either  true  or  false.  In  this  case,  /  can  be  thought  of  as 
a  complicated  statement  built  from  xi,.  ..,Xn  which  is  true  or  false  depending  on  the  truth  and 
falsity  of  the  Xj's. 

For  j/i, . . . ,  t/fe  €  {0, 1},  j/i?/2  ■  ■  -  yk  is  normal  multiplication;  that  is, 

2/12/2  •••J/fc  = 

Define 

2/1  +  2/2  H  h  2/ 

With  the  true-false  interpretation,  multiplication  corresponds  to  "and"  and  +  corresponds  to  "or." 
Define  x'  =  1  —  x,  the  complement  of  x. 

A  function  /  is  said  to  be  written  in  disjunctive  form  if 

fixi,...,Xn)  =  Ai  +  ---  +  Ak,  7.4 

where  each  Aj  is  the  product  of  terms,  each  of  which  is  either  an  Xi  or  an  x'^.  For  example,  let 
g{xi,X2,Xs)  be  1  if  exactly  two  of  Xi,  X2  and  Xs  arc  1,  and  0  otherwise.  Then 

g{xi,X2,X3)    =   XiX2x'r^  +  Xix'^Xi  +  x'-^X2Xs 

and 

g{xi,X2,Xz)'    =    x\x'2  +  x'lX'^  +  x'2x'^  +  X1X2XZ 

If  /c  =  0  in  (7.4)  (i.e.,  no  terms  present),  then  it  is  interpreted  to  be  0  for  all  Xi, . . . ,  x„. 
We  will  prove 

Theorem  7.2    Every  Boolean  function  can  be  written  in  disjunctive  form. 

Let  A{n)  be  the  theorem  for  Boolean  functions  on  n  variables.  There  are  2^  =  4  Boolean 
functions  on  1  variable.  Here  arc  the  functions  and  disjunctive  forms  for  them; 


(/(0),/(l)) 

(0,0)  (0,1)  (1,0)  (1,1) 

form 

This  proves  ^(1). 

For  n  >  1  we  have 

/(xi,...,x„)  =  (50(2:1, ...  ,x„_i)  x^)  +  (.gi(xi, ...  ,x„_i)  x„),  7.5 

where  gk{xi, .  ■ .  ,Xn-i)  =  f{xi, . . .  ,x„_i,  k).  To  see  this,  note  that  when  x„  =  0  the  right  side  of 
(7.5)  is  {go  •  1)  +  (51  •  0)  =  30  =  /  and  when  x„  =  1  it  is  {go  ■  0)  +  {go  •  1)  =  51  =  /• 

By  the  induction  assumption,  both  go  and  gi  can  be  written  in  disjunctive  form,  say 

go  =  Ai  +  ---  +  Aa       and       gi  =  Bi  +  ■  ■  ■  +  Bb-  7.6 

We  claim  that 

{Ci  +  ---  +  Ca)y  =  Ciy  +  ---  +  Ccy.  7.7 

If  this  is  true,  then  it  can  be  used  in  connection  with  (7.6)  in  (7.5)  to  complete  the  inductive  step. 
To  prove  (7.7),  notice  that 

(the  left  side  of  (7.7)  equals  1)    if  and  only  if    {y  =  1  and  some  Ci  =  1). 
This  is  equivalent  to 


f  1,    if  2/1  =  2/2  =  ■  •  •  =  2/fe  =  1; 
1^  0,  otherwise. 

^   f  0,    if  2/1  =  2/2  =  ■  •  •  =  2/fc  =  0; 
1 1,  otherwise. 
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(the  left  side  of  (7.7)  equals  1) 


if  and  only  if 


(some  CiU 


!)• 


however, 


(the  right  side  of  (7.7)  equals  1) 


if  and  only  if 


(some  CiU 


1). 


This  proves  that  the  left  side  of  (7.7)  equals  1  if  and  only  if  the  right  side  equals  1.  Thus  (7.7)  is 
true.  D 

Suppose  you  have  a  result  that  you  are  trying  to  prove.  If  you  are  unable  to  do  so,  you  might 
try  to  prove  a  bit  less  because  proving  less  should  be  easier.  That  is  not  always  true  for  proofs  by 
induction.  Sometimes  it  is  easier  to  prove  more!  How  can  this  be?  The  statement  A{n)  is  not  just 
the  thing  you  want  to  prove,  it  is  also  the  assumption  that  you  have  to  help  you  prove  A{m)  for 
m  >  n.  Thus,  a  stronger  inductive  hypothesis  gives  you  more  to  prove  and  more  to  prove  it  with. 
This  should  be  emphasized: 

Principle  More  may  be  better  If  the  induction  hypothesis  seems  to  weak  to  carry  out  an 
inductive  proof,  consider  trying  to  prove  a  stronger  theorem. 

We've  already  encountered  this  in  proving  Theorem  6.2  (p.  153).  We  had  wanted  to  prove  that  every 
connected  graph  has  a  lineal  spanning  tree.  We  might  have  used 

Ai{n):  "If  G  is  an  n- vertex  connected  graph,  it  has  a  lineal  spanning  tree." 

Instead  we  used  the  stronger  statement 

A2  (n) :  "If  G  is  an  n- vertex  connected  graph  containing  the  vertex  r,  it  has  a  lineal  spanning 

tree  with  root  r." 

If  you  try  to  prove  Ai{n)  by  induction,  you'll  soon  run  into  problems.  Try  it.  The  following  example 
illustrates  the  usefulness  of  generalizing  the  hypothesis  for  some  inductive  proofs. 

Example  7.4  Graphs  and  Ramsey  Theory  Let  fc  be  a  positive  integer  and  let  G  =  {V,E) 
be  an  arbitrary  simple  graph.  Can  we  find  a  subset  S  CV  such  that  l^l  =  A;  and  either 

•  for  all  x,y  €  S,  we  have  {a;, y}  G  E  or 

•  for  all  x,y  €  S,  we  have  {a;, y}  ^  El 

If  \V\  is  too  small,  e.g.,  |V|  <  fc,  the  answer  is  obviously  "No."  Instead,  we  might  ask,  "Is  there  an 
N{k)  such  that  there  exists  an  S  with  the  above  properties  whenever  \V\>  N{k)7"  You  should  be 
able  to  see  that,  if  we  find  some  value  which  works  for  N{k),  then  any  larger  value  will  also  work. 

It's  easy  to  see  that  we  can  choose  A''(2)  =  2:  Pick  any  two  x,y  &V,  let  S  =  {x, y}.  Since  {a;, y} 
is  either  an  edge  in  G  or  is  not,  we  are  done. 

Let's  try  to  show  that  N{3)  exists  and  find  the  smallest  possible  value  we  can  choose  for  it. 

You  should  find  a  simple  graph  G  with  \V\  =  5  for  which  the  result  is  false  when  k  =  3;  that  is, 
for  any  set  of  three  vertices  in  G  there  is  at  least  one  pair  that  are  joined  by  an  edge  and  at  least 
one  pair  that  are  not  joined  by  an  edge.  Having  done  this,  you've  shown  that,  if  iV(3)  exists  it  must 
be  greater  than  5. 

We  now  prove  that  we  may  take  A'^(3)  =  6.  Select  any  v  G  V.  Of  the  remaining  five  or  more 
vertices  in  V  there  must  be  at  least  three  that  are  joined  to  v  or  at  least  three  that  are  not  joined 
to  V.  We  do  the  first  case  and  leave  the  latter  for  you.  Let  xi,  X2  and  X3  be  three  vertices  joined 
to  V.  If  {a;^,  Xj}  G  E,  then  all  pairs  of  vertices  in  {w,  Xi,  Xj}  arc  joined  by  edges  and  we  are  done.  If 
{xi,  Xj}  ^  E  for  all  i  and  j,  then  none  of  the  pairs  of  vertices  in  {xi,X2,  X3}  are  joined  by  edges  and, 
again,  we  are  done.  We  have  shown  that  we  may  take  A''(3)  =  6. 

Since  the  proof  that  iV(3)  exists  involved  reduction  to  a  smaller  situation,  it  suggests  that  we 
might  be  able  to  prove  the  existence  of  N{k)  by  induction  on  k.  How  would  this  work?  Here's  a  brief 
sketch.  We'd  select  v  G  V  and  note  the  existence  of  a  large  enough  set  all  of  whose  vertices  were 
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cither  joined  to  ?;  by  edges  or  not  joined  to  v.  As  above,  we  could  assume  the  former  case.  We  now 
want  to  know  that  there  exist  either  k  —  1  vertices  all  joined  to  or  vertices  not  joined  to  v.  This 
requires  the  induction  assumption,  but  we  are  stuck  because  we  are  looking  for  either  a  set  of  size 

/c  —  1  or  one  of  size  k,  which  arc  two  different  sizes.  We  can  get  around  this  problem  by  strengthening 
the  statement  of  the  theorem  to  allow  two  different  sizes.  Here's  the  theorem  we'll  prove. 

Theorem  7.3  A  special  case  of  Ramsey's  Theorem    There  exists  a  function  N{ki,k2), 

defined  for  all  positive  integers  ki  and  k^,  such  that,  for  all  simple  graphs  G  =  {V,  E)  with  at 
least  N{k\,k2)  vertices,  there  is  a  set  S  CV  such  that  either 

•  |5|  =  fci  and  {x,  y}  G  E  for  all  x       both  in  S,  or 

•  |5|  =  ^2  and  {x,  y}  ^  E  for  all  x  ^y  both  in  S. 

In  fact,  we  may  define  an  acceptable  N{ki,k2)  recursively  by 


If  you  have  mistakenly  assumed  that  N{k\,k2)  is  uniquely  defined — an  easy  error  to  make — the 
phrase  "an  acceptable  N{ki,  ^2)"  in  the  theorem  probably  bothers  you.  Look  back  over  our  earlier 
discussion  of  N{k),  which  is  the  case  ki  =  k2-  We  said  that  N{k)  was  any  number  such  that  if  wc 
had  at  least  N{k)  vertices  something  was  true,  and  we  observed  that  if  some  value  worked  for  N{k), 
any  larger  value  would  also  work.  Of  course  we  could  look  for  the  smallest  possible  choice  for  N{k). 
We  found  that  this  is  2  when  k  —  2  and  is  6  when  fc  =  3.  The  theorem  does  not  claim  that  the 
recursion  (7.8)  gives  the  smallest  possible  choice  for  N{k\,k2)-  In  fact,  it  tends  to  give  numbers  that 
are  much  too  big.  Since  we  showed  earlier  that  the  smallest  possible  value  for  A^'(3, 3)  is  6,  you  might 
mistakenly  think  that  finding  the  smallest  is  easy.  In  fact,  the  smallest  possible  value  of  N{k\,k2)  is 
unknown  for  almost  all  [ki,  k2). 

Proof:    We'll  use  induction  on  n  =  fci  +  A;2. 

Before  starting  the  induction  step,  we'll  do  the  case  in  which  fci  =  1  or  ^2  =  1  (or  both).  If 
ki  ~  1,  choose  s  E  V  and  set  S  =  {s}.  The  theorem  is  trivially  true  because  there  are  no  a;  7^  ?/  in 
S.  Similarly,  it  is  trivially  true  if  ^2  =  1. 

We  now  carry  out  the  inductive  step.  By  the  previous  paragraph,  we  can  assume  that  ki  >  1 
and  ^2  >  1.  Choose  v  gV  and  define 


It  follows  by  (7.8)  that  either  \Vi  \  >  N{ki  -  1,^2)  or  1^2!  >  N{ki,k2  -  1).  We  assume  the  former. 
(The  argument  for  the  latter  case  would  be  very  similar  to  the  one  we  arc  about  to  give.) 

Look  at  the  graph  (Vi,  fl  p2(Vi)).  Since  |Vi|  >  N{ki  —  1,^2),  it  follows  from  the  inductive 
hypothesis  that  there  is  a  set  S'  C  Vi  such  that  either 

•  |S"|  =  fci  —  1  and  {x,  y}  G  E  for  all  x  ^  y  both  in  S',  or 

•  =  k2  and  {x,  y}  ^  E  for  all  x  ^  y  both  in  S". 

If  the  former  is  true,  let  S  =  S'  L)  {v};  otherwise,  let  S  =  S'.  This  completes  the  proof. 

A  more  general  form  of  Ramsey's  Theorem  asserts  that  there  exists  a  function  Nr(ki, . . . ,  kd) 
such  that  for  all  V  with  \V\  >  Nr{k\, . . . ,  kd)  and  all  /  from  VriV)  to  d,  there  exists  an  z  G  d  and  a 
set  5  C  y  such  that  \S\  =  ki  and  /(e)  =  i  for  all  e  G  VriS).  The  theorem  we  proved  is  the  special 
case  N2{ki,  ^2)  and  /(e)  is  1  or  2  according  as  e  E  E  or  e  ^  E.  Although  the  more  general  statement 
looks  fairly  complicated,  its  no  harder  to  prove  than  the  special  case — provided  you  don't  get  lost 
in  all  the  notation.  You  might  like  to  try  proving  it.  Q 


ifki  =  1; 
ifk2  =  1; 


otherwise. 
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Exercises 


In  these  exercises,  indicate  clearly 

(i)  what  A{n)  is, 

(ii)  what  the  inductive  step  is  and 

(iii)  where  the  inductive  hypothesis  is  used. 

7.1.1.  Indicate  (i)-(iii)  in  the  proof  of  the  rank  formula  (Theorem  3.1  (p.  76)). 

7.1.2.  Indicate  (i)-(iii)  in  the  proof  of  the  greedy  algorithm  for  unranking  in  Section  3.2  (p.  80). 

7.1.3.  Do  Exercise  1.3.11. 

7.1.4.  For  n  >  0,  let  Dn  be  the  number  of  derangements  (permutations  with  no  fixed  points)  of  an  n-set. 
By  convention,  Dq  =  1.  The  next  few  values  are  Di  =  0,  D2  =  1,  D3  =  2  and  D4  =  9.  Here  are  some 
statements  about  Dn- 


(i)  Dn 

=  nDn-i  +  (-1)"    for  n 

>  1- 

(ii)  Dn 

=  {n-l){Dn-l+Dn-2) 

for  n>2. 

(iii)  Dn 

k=0 

(a)  Use  (i)  to  prove  (ii).  (Induction  is  not  needed.) 

(b)  Use  (ii)  to  prove  (i). 

(c)  Use  (i)  to  prove  (iii). 

(d)  Use  (iii)  to  prove  (i).  (Induction  is  not  needed) 

7.1.5.  Write  the  following  Boolean  functions  in  disjunctive  form.  The  functions  are  given  in  two-line  form. 
/X  /  0,0   0,1    1,0   1,1  \ 

U  i     0     0  )■ 

(b) 

^  ^  \  0  1     1     0  J 

(f.)  (0,0,0  0,0,1  0,1,0  0,1,1  1,0,0  1,0,1  1,1,0  1,1,1  \ 
'■''vo      1      0      1       1      0      1      0  J' 

f 0,0,0  0,0,1  0,1,0  0,1,1  1,0,0  1,0,1  1,1,0  1,1,1  N 
'■-'\o       1       1       1       1       0       0  0/' 

7.1.6.  Write  the  following  Boolean  functions  in  disjunctive  form. 

(a)  (,-j;i  +  x:i){x2  +  2:4). 

(b)  (xi  +  a;2)(a;i  +  a;3)(3;2  +  X3). 

7.1.7.  A  Boolean  function  /  is  written  in  conjunctive  form  if  /  =  A\A2  ■  ■  ■,  where  Ai  is  the  "or"  of  terms 
each  of  which  is  either  an  Xi  or  an  x'^.  Prove  that  every  Boolean  function  can  be  written  in  conjunctive 
form. 

Hint.  The  proof  parallels  that  in  Example  7.3.  The  hardest  part  is  probably  finding  the  equation  that 
replaces  (7.5). 

7.1.8.  Part  II  contains  a  variety  of  results  that  arc  proved  by  induction.  Some  appear  in  the  text  and  some 
in  the  exercises.  Write  careful  inductive  proofs  for  each  of  the  following. 

(a)  Every  connected  graph  has  a  lineal  spanning  tree. 

(b)  The  number  of  ways  to  color  a  simple  graph  G  with  x  colors  is  a  polynomial  in  x.  (Do  this  by 

deletion  and  contraction.) 

(c)  Euler's  relation:  v  —  e  +  f  =  2. 

(d)  Every  planar  graph  can  be  colored  with  5  colors. 

(e)  Using  the  fact  that  every  tree  has  a  leaf,  prove  that  an  n-vertex  tree  has  exactly  n  —  1  edges. 

(f)  Every  n-vertex  connected  graph  has  at  least  n  —  1  edges. 
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*7.1.9.  Using  the  definition  of  the  Fibonacci  numbers  in  Example  7.2,  prove  that 
Fn+k+i  =  Fn+iFk_^_i  +  FnFk    for    fc  >  0  and  n  >  0. 
Do  not  use  formula  (7.3). 

Hint.  You  may  find  it  useful  to  note  that  n  +  k  +  1  =  (n  —  1)  +  (fc  +  1)  +  1. 

7.2   Thinking  Recursively 


A  recursive  formula  tells  us  how  to  compute  the  value  we  are  interested  in  terms  of  earlier  ones. 
(The  "earliest"  values  are  specified  separately.)  How  many  can  you  recall  from  previous  chapters? 
A  recursive  definition  describes  more  complicated  instances  of  a  concept  in  terms  of  simpler  ones. 
(The  "simplest"  instances  are  specified  separately.)  These  are  examples  of  the  recursive  approach, 
which  we  defined  at  the  beginning  of  this  part: 

Definition  7.2    Recursive  approach     A  recursive  approach  to  a  proWem  consists  of  two 

parts: 

1.  TJie  problem  is  reduced  to  one  or  more  problems  of  the  same  kind  which  are  simpler  in 
some  sense. 

2.  There  is  a  set  of  simplest  problems  to  which  all  others  are  reduced  after  one  or  more  steps. 
Solutions  to  these  simplest  problems  are  given. 

This  definition  focuses  on  tearing  down  (reduction  to  simpler  cases).  Sometimes  it  may  be  eas- 
ier or  better  to  think  in  terms  of  building  up  (construction  of  bigger  cases).  We  can  simply  turn 
Definition  7.2  on  its  head: 

Definition  7.3    Recursive  solution     Wc  have  a  recursive  solution  to  the  problem  (proof, 

algorithm,  data  structure,  etc.)  if  the  following  two  conditions  hold. 

1.  TJie  set  of  simplest  problems  can  be  dealt  with  (proved,  calculated,  sorted,  etc.). 

2.  The  solution  to  any  other  problem  can  be  built  from  solutions  to  simpler  problems,  and 
this  process  eventually  leads  back  to  the  simplest  problems. 

Let's  look  briefly  at  some  examples  where  recursion  can  be  used.  Suppose  that  we  are  given  a 
collection  of  things  and  a  problem  associated  with  them.  Examples  of  things  and  problems  are 

•  assertions  A{n)  that  we  want  to  prove; 

•  the  binomial  cocflicicnts  C{n,  k)  that  we  want  to  compute  using 

C{n,  k)  =  C{n  -  1,  fc)  +  C{n  -  1,  fc  -  1)  from  Section  1.4; 

•  the  recursion  £)„  =  (n  —  l)(£)„_i  +  Dn-2)  for  derangements  that  we  want  to  prove  by  a  direct 
combinatorial  argument; 

•  lists  to  sort; 

•  RP-trees  that  we  want  to  define. 

Suppose  wc  have  some  binary  relation  between  these  things  which  we'll  denote  by  "simpler  than." 
If  there  is  nothing  simpler  than  a  thing  X,  we  call  X  "simplest."  There  may  be  several  simplest 
things.  In  the  examples  just  given,  we'll  soon  see  that  the  following  notions  are  appropriate. 

•  A{n)  is  simpler  than  A{m)  if  n  <m  and  ^(1)  is  the  simplest  thing. 

•  C(n,  k)  is  simpler  than  C(m,j)  if  n  <  m  and  the  C(0,  fc)'s  are  the  simplest  things. 
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•  Dn  is  simpler  that  Dm  if  n  >  m  and  the  simplest  things  are  Dq  and  Di. 

•  One  list  is  simpler  than  another  if  it  contains  fewer  items  and  the  lists  with  one  item  are  the 
simplest  things. 

•  One  tree  is  simpler  than  another  if  it  contains  less  vertices  and  the  one  vertex  tree  is  the  simplest. 

Example  7.5  The  induction  theorem  The  induction  theorem  in  the  previous  section  solves 
the  problem  of  proving  A{n)  recursively.  There  is  only  one  simplest  problem:  ^(1).  We  are  usually 
taking  a  reduction  viewpoint  when  we  prove  something  by  induction.  Q 

Example  7.6  Calculating  binomial  coefficients  Find  a  method  for  calculating  the  binomial 
coefficients  C(n,  fc).  As  indicated  above,  we  let  the  simplest  values  be  those  with  n  =  0.  From 
Chapter  1  we  have 


This  solves  the  problem  recursively.  Q 

Is  the  derivation  of  the  binomial  coefficient  recursion  done  by  reduction  or  construction?  We  can 
derive  it  by  dividing  the  fc-subscts  of  n  into  those  that  contain  n  and  those  that  do  not.  This  can 
be  regarded  as  reduction  or  construction.  Such  ambiguities  are  common  because  the  two  concepts 
are  simply  different  facets  of  the  same  thing.  Nevertheless,  it  is  useful  to  explore  reduction  versus 
construction  in  problems  so  as  to  gain  facility  with  solving  problems  recursively.  We  do  this  now  for 
derangements. 

Example  7.7  A  recursion  for  derangements  A  derangement  is  a  permutation  without  fixed 
points  and  Z>„  is  the  number  of  derangements  of  an  n-set.  In  Exercise  7.1.4  the  recursion 


with  initial  conditions  Dq  =  1  and  Di  =  0  was  stated  without  proof.  We  now  give  a  derivation  using 
reduction  and  construction  arguments. 

Look  at  a  derangement  of  n  in  cycle  form.  Since  a  derangement  has  no  fixed  points,  no  cycles 
have  length  1.  We  look  at  the  cycle  of  the  derangement  that  contains  n. 

(a)  If  this  cycle  has  length  2,  throw  out  the  cycle. 

(b)  If  this  cycle  has  length  greater  than  2,  remove  n  from  the  cycle. 

In  case  (a),  suppose  k  is  in  the  cycle  with  n.  We  obtain  every  derangement  of  n  —  {k,n}  exactly 
once.  Since  there  are  n  —  1  possibilities  for  k,  (a)  contributes  (n  —  1)£)„_2  to  the  count.  This  is  a 
reduction  point  of  view. 

In  case  (b),  wc  obtain  derangements  of  w.  —  1.  To  find  the  contribution  of  (b),  it  may  be  easier 
to  take  a  construction  view:  Given  a  derangement  of  n  —  1  in  cycle  form,  we  choose  which  of  the 
n  —  1  elements  to  insert  n  after.  This  gives  a  contribution  of  (n  —  l)Z)„_i 

The  initial  conditions  and  the  range  of  n  for  which  (7.9)  is  valid  can  be  found  by  examining  our 
argument.  There  are  two  approaches: 

•  We  could  take  the  view  that  derangements  only  make  sense  forn  >  1  and  so  (7.9)  is  used  when 
n  >  3,  with  initial  conditions  Di  =  0  and  D2  =  1. 

•  We  could  look  at  the  argument  used  to  derive  the  recursion  and  ask  how  we  should  define  D„ 
for  n  <  1  so  that  the  argument  makes  sense.  Note  that  for  n  =  1,  the  values  of  Dq  and  D-i 
don't  matter  since  the  recursion  gives 


•  C{n,k)  =  C{n-l,k-l)+C{n-l,k). 


Dn  =  (n- +  A1-2)  for  n>2 


7.9 


Di  =  (l-l)(£)o +  £>-!)  =  0(£>o  +  -D-i)  =  0, 
which  is  correct.  What  about  n  =  2?  We  look  at  (a)  and  (b)  separately. 
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(a)  We  want  to  get  the  derangement  (1,2),  so  we  need  Do  =  1. 

(b)  This  should  give  zero  since  there  is  no  derangement  of  2  containing  a  cycle  of  length  exceeding 
2.  Thus  we  need  Di  —  0,  which  we  have. 

To  summarize,  we  can  use  (7.9)  for  n  >  1  with  the  initial  conditions  Dq  =  1.  D 

Example  7.8    Merge  sorting     Merge  sorting  can  be  described  as  follows. 

1.  The  lists  containing  just  one  item  are  the  simplest  and  they  are  already  sorted. 

2.  Given  a  list  of  n  >  1  items,  choose  k  with  1  <  k  <  n,  sort  the  first  k  items,  sort  the  last  n  —  k 
items  and  merge  the  two  sorted  lists. 

This  algorithm  builds  up  a  way  to  sort  an  n-list  out  of  procedures  for  sorting  shorter  lists.  Note 
that  we  have  not  specified  how  the  first  k  or  last  n  —  k  items  are  to  be  sorted,  we  simply  assume 
that  it  has  been  done.  Of  course,  an  obvious  way  to  do  this  is  to  simply  apply  our  merge  sorting 
algorithm  to  each  of  these  subhsts. 

Let's  implement  the  algorithm  using  people  rather  than  a  computer.  Imagine  training  a  large 
number  of  obedient  people  to  carry  out  two  tasks:  splitting  a  list  for  other  people  to  sort  and  merging 
two  lists.  We  give  one  person  the  unsorted  list  and  tell  him  to  sort  it  using  the  algorithm  and  return 
the  result  to  us. 

What  happens?  Anyone  who  has  a  list  with  only  one  item  returns  it  unchanged  to  the  person  he 
received  it  from.  This  is  Case  1  in  Definition  7.3  (p.  204)  (and  also  in  the  algorithm).  Anyone  with 
a  list  having  more  than  one  item  splits  it  and  gives  each  piece  to  a  person  who  has  not  received  a 
list,  telling  each  person  to  sort  it  and  return  the  result.  When  the  results  have  been  returned,  this 
person  merges  the  two  lists  and  returns  the  result  to  whoever  gave  him  the  list.  If  there  are  enough 
obedient  people  around,  we'll  eventually  get  our  answer  back. 

Notice  that  no  one  needs  to  pay  any  attention  to  what  anyone  else  is  doing  to  a  list.  Q 

We  now  look  at  one  of  the  most  important  recursive  definitions  in  computer  science. 

Example  7.9  Defining  rooted  plane  trees  recursively  Rooted  plane  trees  (RP-trees)  were 
defined  in  Section  5.4  (p.  136).  Here  is  a  recursive  constructive  definition  of  RP-trees. 

•  A  single  vertex,  which  we  call  the  root,  is  an  RP-tree. 

•  If  Ti , . . . ,  Tfe  is  an  ordered  list  of  RP-trees  with  roots  ri , . . . ,  and  no  vertices  in  common,  then 
an  RP-tree  can  be  constructed  by  choosing  an  unused  vertex  r  to  be  the  root,  letting  its  ith  child 
be     and  forgetting  that  ri, . . . ,     were  called  roots. 

This  is  a  more  compact  definition  than  the  nonconstructive  one  given  in  Section  5.4.  This  approach 
to  RP-trees  is  very  important  for  computer  science.  We'll  come  back  to  it  in  the  next  section. 

We  should,  and  will,  prove  that  this  definition  is  equivalent  to  that  in  Section  5.4.  In  other 
words,  our  new  "definition"  should  not  be  regarded  as  a  definition  but,  rather,  as  a  theorem — you 
can  only  define  something  once! 

Define  an  edge  to  be  any  set  of  two  vertices  in  which  one  vertex  is  the  child  of  the  other.  Note 
that  the  recursive  definition  insures  that  the  graph  is  connected  and  the  use  of  distinct  vertices 
eliminates  the  possibility  of  cycles.  Thus,  the  "definition"  given  here  leads  to  a  rooted,  connected, 
simple  graph  without  loops.  Furthermore,  the  edges  leading  to  a  vertex's  sons  are  ordered.  Thus  we 
have  an  RP-tree.  To  actually  prove  this  carefully,  one  must  use  induction  on  the  number  of  vertices. 
This  is  left  as  an  exercise. 

It  remains  to  show  that  every  RP-tree,  as  defined  in  Section  5.4,  can  be  built  by  the  method 
described  in  the  recursive  "definition"  given  above.  One  can  use  induction  on  the  number  of  vertices. 
It  is  obvious  for  one  vertex.  Remove  the  root  vertex  and  note  that  each  child  of  the  root  now  becomes 
the  root  of  an  RP-tree.  By  the  induction  hypothesis,  each  of  these  can  be  built  by  our  recursive 
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process.  The  recursive  process  allovirs  us  to  add  a  new  root  whose  children  are  the  roots  of  these 
trees,  and  this  reconstructs  the  original  RP-tree. 
Here  is  another  definition  of  an  RP-tree. 

•  A  single  vertex,  which  we  call  the  root,  is  an  RP-tree. 

•  If  Ti  and  T2  are  RP-trees  with  roots  ri  and  r2  and  no  vertices  in  common,  then  an  RP-tree  can 
be  constructed  by  connecting  n  to  r2  with  an  edge,  making  the  root  of  the  new  tree  and 
making  n  the  leftmost  child  of  r2- 

We  leave  it  to  you  to  prove  that  this  is  equivalent  to  the  previous  definition  Q 

Example  7.10  Recursions  for  rooted  plane  trees  Rooted  trees  in  which  each  non-leaf  vertex 
has  exactly  two  edges  leading  away  from  the  root  is  called  a  full  binary  tree.  By  replacing  k  in  the 
previous  example  with  2,  we  have  a  recursive  definition  of  them:  A  full  binary  RP-tree  is  either  a 

single  vertex  or  a  new  root  vertex  joined  to  two  biW  binary  RP-trees. 

As  noted  in  Section  1.4  (p.  32),  recursive  constructions  lead  to  recinsions.  Let's  use  the  previous 
recursive  definition  to  get  a  recursion  for  full  binary  trees.  Suppose  we  require  that  any  node  that 
is  not  a  leaf  have  exactly  two  children.  Let  fo„  be  the  number  of  such  trees  that  have  n  leaves.  From 
the  recursive  definition,  we  have  61  =  1  for  the  single  vertex  tree.  Since  the  recursive  construction 
gives  us  each  tree  exactly  once,  we  have 


To  see  why  this  is  so,  apply  the  Rules  of  Sum  and  Product:  First,  partition  the  problem  according 
to  the  number  of  leaves  in  Ti,  which  is  the  j  in  our  formula.  Second,  for  each  case  choose  Ti  and 
then  choose  T2,  which  gives  us  the  term  bjbfi—j. 

If  we  try  the  same  approach  for  general  rooted  plane  trees,  we  have  two  problems.  First,  we  had 
better  not  count  by  leaves  since  there  are  an  infinite  number  of  trees  with  just  one  leaf,  namely  trees 

with  of  the  form  •  •  •  •.  Second,  the  fact  the  the  definition  involves  Ti, . . . ,     where  k  can  be 

any  positive  integer  makes  the  recursion  messy:  we'd  have  to  sum  over  all  such  k  and  for  each  k  we'd 
have  a  product  tj-^  ■  ■  ■  tj^  to  sum  over  all  j's  siieh  that  .?i  +  •  •  •  +  jk  has  the  appropriate  value. 

The  first  problem  is  easy  to  fix:  let  t„  be  the  number  of  rooted  plane  trees  with  n  vertices.  The 
second  problem  requires  a  new  recursive  construction,  which  means  we  have  to  be  clever.  We  use 
the  construction  in  the  last  paragraph  of  the  previous  example.  We  then  have  ti  =  1  and,  for  n  >  1, 
tn  =  tjtfi-j,  because  if  the  constructed  tree  has  n  vertices  and  Ti  has  j  vertices,  then  T2  has 

n  —  j  vertices.  Notice  that  t„  and  6„  satisfy  the  same  recursion  with  the  same  initial  conditions. 
Since  the  recursion  lets  us  compute  all  values  recursively,  it  follows  that  tn  =  bn-  (Alternatively,  you 
could  prove  6„  =  t„  using  the  recursion  and  induction  on  n.)  Actually,  there  is  a  slight  gap  here: 
we  didn't  prove  that  the  new  recursive  definition  gives  all  rooted  plane  trees  and  gives  them  exactly 
once.  We  leave  it  to  you  to  convince  yourself  of  this.  Q 

A  recursive  algorithm  is  an  algorithm  that  refers  to  itself  when  it  is  executing.  As  with  any 
recursive  situation,  when  an  algorithm  refers  to  itself,  it  must  be  with  "simpler"  parameters  so  that 
it  eventually  reaches  one  of  the  "simplest"  cases,  which  is  then  done  without  recursion.  Our  recursion 
for  C(n,  fc)  can  be  viewed  as  a  recursive  algorithm.  Our  description  of  merge  sorting  in  Examples  7.8 
gives  a  recursive  algorithm  if  the  sorting  required  in  Step  2  is  done  by  using  the  algorithm  itself. 
Let's  look  at  one  more  example  illustrating  the  recursive  algorithm  idea. 


n-l 


for  n  >  1. 
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Example  7.11  A  recursive  algorithm  Suppose  you  are  interested  in  listing  all  sequences  of 
length  eight,  consisting  of  four  zeroes  and  four  ones.  Suppose  that  you  have  a  friend  who  does  this 
sort  of  thing,  but  will  only  make  such  lists  if  the  length  of  the  sequence  is  seven  or  less.  "Nope," 
he  says,  "I  can't  do  it — the  sequence  is  too  long."  There  is  a  way  to  trick  your  friend  into  doing  it. 
First  give  him  the  problem  of  listing  all  sequences  of  length  seven  with  three  ones.  He  doesn't  mind, 
and  gives  you  the  list  1110000,  1011000,  0101100,  etc.  that  he  has  made.  You  thank  him  politely, 
sneak  off,  and  put  a  "1"  in  front  of  every  sequence  in  the  list  he  has  given  you  to  obtain  11110000, 
11011000,  10101100,  etc.  Now,  you  return  to  him  with  the  problem  of  listing  all  strings  of  length 
seven  with  four  ones.  He  returns  with  the  list  1111000,  0110110,  0011101,  etc.  Now  you  thank  him 
and  sneak  off  and  put  a  "0"  in  front  of  every  sequence  in  the  list  he  has  given  you  to  obtain  01111000, 
00110110,  00011101,  etc.  Putting  these  two  lists  together,  you  have  obtained  the  hst  you  originally 
wanted. 

How  did  your  friend  produce  these  lists  that  he  gave  you?  Perhaps  he  had  a  friend  that  would 
only  do  lists  of  length  6  or  less,  and  he  tricked  this  friend  in  the  same  way  you  tricked  him!  Perhaps 
the  "6  or  less"  friend  had  a  "5  or  less  friend"  that  he  tricked,  etc.  If  you  are  sure  that  your  friend 
gave  you  a  correct  list,  it  doesn't  really  matter  how  he  got  it.  D 

These  examples  are  rather  easy  to  follow,  but  what  happens  if  we  look  into  them  more  deeply?  We 
might  ask  just  how  C(15,  7)  is  calculated  in  terms  of  the  simplest  values  C(0,  k)  without  specifying 
any  of  the  intermediate  values.  We  might  ask  just  what  all  of  our  trained  sorters  are  doing.  We  might 
ask  how  your  friend  got  his  list  of  sequences. 

This  kind  of  analysis  is  often  tempting  to  do  when  we  arc  debugging  reciirsivc  algorithms.  It  is 
almost  always  the  wrong  thing  to  do.  Asking  about  such  details  usually  leads  to  confusion  and  gets 
one  so  off  the  track  that  it  is  even  harder  to  convince  oneself  that  the  algorithm  is  correct. 

Why  is  it  unnecessary  to  "unwind"  the  recursion  in  this  fashion?  If  Case  2  of  our  recursive 
solution  as  given  by  Definition  7.3  (p.  204)  correctly  describes  what  to  do,  assuming  that  the  simpler 
problems  have  been  done  correctly,  then  our  recursive  solution  works!  This  can  be  demonstrated  the 
way  induction  was  proved:  If  the  solution  fails,  there  must  be  a  problem  for  which  it  fails  such  that  it 
succeeds  for  all  simpler  problems.  If  this  problem  is  simplest,  it  contradicts  Case  1  in  Definition  7.3. 
If  this  problem  is  not  simplest,  it  contradicts  Case  2  in  Definition  7.3  since  all  simpler  problems 
have  been  dealt  with.  Thus  our  assumption  that  the  solution  fails  has  led  to  a  contradiction.  It 
is  important  to  understand  this  proof  since  it  is  the  theoretical  basis  for  recursive  methods.  To 
summarize: 

Principle  Thinking  recursively  Carefully  verify  the  two  parts  of  Definition  7.2  or  of  Defi- 
nition 7.3.  Avoid  studying  tic  results  of  iterating  the  recursive  solution. 

If  you  are  just  learning  about  recursion,  you  may  find  it  difficult  to  believe  that  this  general 
strategy  will  work  without  seeing  particular  solutions  where  the  reduction  to  the  simplest  cases  is 
laid  out  in  full  detail.  In  even  a  simple  recursive  solution,  it  is  likely  that  you'll  become  confused 
by  the  details,  even  if  you're  accustomed  to  thinking  recursively.  If  you  agree  that  the  proof  in  the 
previous  paragraph  is  correct,  then  such  detail  is  not  needed  to  see  that  the  algorithm  is  correct.  It 
is  very  important  to  realize  this  and  to  avoid  working  through  the  stages  of  the  recursive  solution 
back  to  the  simplest  things. 

If  for  some  reason  you  must  work  backwards  through  the  recursive  stages,  do  it  gingerly  and 
carefully.  When  must  you  work  backwards  like  this? 

•  For  some  reason  you  may  be  skipping  over  an  error  in  your  algorithm  and  so  are  unable  to 
correct  it.  The  imwinding  process  can  help,  probably  not  because  it  will  help  you  find  the  error 
directly  but  because  it  will  force  you  to  examine  the  algorithm  more  closely. 

•  You  may  wish  to  replace  the  recursive  algorithm  with  a  nonrecursive  one.  That  may  require  a 
much  deeper  understanding  of  what  happens  as  the  recursion  is  iterated. 
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The  approach  of  focusing  on  the  two  steps  of  an  inductive  solution  is  usually  difficult  for  beginners 
to  maintain.  Resist  the  temptation  to  abandon  it!  This  does  not  mean  that  you  should  avoid  details, 
but  the  details  you  should  concern  yourself  with  are  different: 

•  "Is  every  solution  built  from  simplest  solutions,  and  have  I  handled  the  simplest  solutions  prop- 
erly?" If  not,  then  the  foundation  of  your  recursively  built  edifice  is  rotten  and  the  entire  structure 
will  collapse. 

•  "Is  my  description  of  how  to  use  simpler  solutions  to  build  up  more  complicated  ones  correct?" 

•  "If  this  is  an  algorithm,  have  I  specified  all  the  recursive  parameters?" 

This  last  point  will  be  dealt  with  in  the  next  section  where  we'll  discuss  implementation. 

This  does  not  mean  one  should  never  look  at  the  details  of  a  recursion.  There  are  at  least 

two  situations  in  which  one  docs  so.  First,  one  may  wish  to  develop  a  nonrecursive  algorithm. 
Understanding  the  details  of  how  the  recursive  algorithm  works  may  be  useful.  Second,  one  may 
need  to  reduce  the  amount  of  storage  a  recursive  algorithm  requires. 


Exercises 


7.2.1.  We  will  prove  that  all  positive  integers  are  equal.  Let  A{n)  be  the  statement  "All  positive  integers 
that  do  not  exceed  n  are  equal."  In  other  words,  "If  p  and  q  are  integers  between  1  and  n  inclusive, 
then  p  =  q."  Since  1  is  the  only  positive  integer  not  exceeding  1,  ^(1)  is  true.  For  n  >  1,  we  now 
assume  A{n  —  1)  and  prove  A{n).  If  p  and  q  are  positive  integers  not  exceeding  n,  let  p'  =  p  —  1  and 
q'  =  q—1.  Since  p'  and  q'  do  not  exceed  n  —  1,  we  have  p'  =  q'  by  A{n  —  1).  Thus  p  =  q.  This  proves 
A{n) .  Where  is  the  error? 

7.2.2.  What  is  wrong  with  the  following  proof  that  every  graph  can  be  drawn  in  the  plane  in  such  a  way 
that  no  edges  intersect?  Let  A{n)  be  the  statement  for  all  graphs  with  n  vertices.  Clearly  ^(1)  is 
true.  Let  G  be  a  graph  with  vertices  vi, . . .  ,Vn-  Let  Gi  be  the  subgraph  induced  hy  V2,  ■  ■  ■  ,Vn  and 
let  Gn  be  the  subgraph  induced  by  vi, . . .  ,Vn-i-  By  the  induction  assumption,  we  can  draw  both 
Gi  and  Gn  in  the  plane.  After  drawing  Gn,  add  the  vertex  Vn  near  vi  and  use  the  drawing  of  Gi  to 
see  how  to  connect  Vn  to  the  other  vertices. 

7.2.3.  What  is  wrong  with  the  following  proof  that  all  positive  integers  are  interesting?  Suppose  the  claim 
is  false  and  let  n  be  the  smallest  positive  integer  which  is  not  interesting.  That  is  an  interesting  fact 
about  n,  so  n  is  interesting! 

7.2.4.  What  is  wrong  with  the  following  method  for  doing  this  exercise?  Ask  someone  else  in  the  class  who 
will  tell  you  the  answer  if  he/she  knows  it.  If  that  person  knows  it,  you  are  done;  otherwise  that 
person  can  use  this  method  to  find  the  answer  and  so  you  arc  done  anyway. 

Remark:  Of  course  it  could  be  wrong  morally  because  it  may  be  cheating.  For  this  exercise,  you 
should  find  another  reason. 

7.2.5.  This  relates  to  Example  7.9.  Fill  in  the  details  of  the  proof  of  the  equivalence  of  the  two  definitions 
of  RP-trees. 
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7.3    Recursive  Algorithms 


We'll  begin  this  section  by  using  merge  sort  to  illustrate  how  to  obtain  information  about  a  recursive 
algorithm.  In  this  case  we'll  look  at  proof  of  correctness  and  a  recursion  for  running  time.  Next  we'll 
turn  our  attention  to  the  local  description  of  a  recursive  procedure.  What  are  the  advantages  of 
thinking  locally? 

•  Simplicity:  By  thinking  locally,  we  can  avoid  the  quagmire  that  often  arises  in  attempting  to 
unravel  the  details  of  the  recursion.  To  avoid  the  quagmire:  Think  locally,  but  remember  to  deal 
with  initial  conditions. 

•  Implementation:  A  local  description  lays  out  in  graphical  form  a  plan  for  coding  up  a  recursive 
algorithm. 

•  Counting:  One  can  easily  develop  a  recursion  for  counting  structures,  operations,  etc. 

•  Proofs:  A  local  description  lays  out  the  plan  for  an  inductive  proof. 

Finally,  we'll  turn  our  attention  to  the  problem  of  how  recursive  algorithms  are  actually  implemented 
on  a  computer.  If  you  are  not  programming  recursive  algorithms  at  present,  you  may  think  of  the 
implementation  discussion  as  an  extended  programming  note  and  file  it  away  for  future  reference 
after  skimming  it. 

Obtaining  Information:  Merge  Sorting 


Here's  an  algorithm  for  "merge  sorting"  the  sequence  s  and  storing  the  answer  in  the  sequence  t. 

Procedure  SORT(si, . . . ,  s^i  into 
If  (n=  1) 
ti  =  Si 
Return 
End  if 

Let  m  be  n/2  with  remainder  discarded 
SORT(si, . . . ,       into  ui,...,^^) 

SORT(Sto+i,  .  .  .  ,  S„   into  Vi,  .  .  .  ,Vn-m) 

MERGE(sequences  u  and  v  into  t) 
Return 

End 

How  do  we  know  it  doesn't  run  forever  an  "infinite  loop"?  How  do  wc  know  it's  correct?  How 
long  does  it  take  to  run?  As  we'll  see,  we  can  answer  such  questions  by  making  modifications  to  the 
algorithm. 

The  infinite  loop  question  can  be  dealt  with  by  verifying  the  conditions  of  Definition  7.2  (p.  204). 
For  the  present  algorithm,  the  complexity  of  the  problem  is  the  length  of  the  sequence  and  the 
simplest  case  is  a  1-long  sequence.  The  algorithm  deals  directly  with  the  simplest  case.  Other  cases 
are  reduced  to  simpler  ones  because  a  list  is  divided  into  shorter  lists. 
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first        1367     1367     1367     1367     1367  1367 
second    245       245       245       245       245  245 
output    1  12  123       1234     12345  1234567 

Figure  7.1    Merging  two  sorted  lists.  The  first  and  second  lists  are  shown  with  their  pointers  at  each  step. 


Example  7.12  The  merge  sort  algorithm  is  correct  One  way  to  prove  program  correctness 
is  to  insert  claims  into  the  code  and  then  prove  that  the  claims  are  correct.  For  recursive  algorithms, 
this  requires  induction.  We'll  assume  that  the  MERGE  algorithm  is  known  to  be  correct.  (Proving  that 
is  an  another  problem.)  Here's  our  code  with  comments  added  for  proving  correctness. 

Procedure  SORT(si, . . . ,  s„  into 
If  (n=  1) 

ti  =  Sl 
Return 
End  if 

Let  m  be  n/2  with  remainder  discarded 
SORT(si, . .  . ,  Sm  into  ui,...,Um) 
SORT(Sm+i, . . . ,  s„  into  vi, . . .  ,Vn-m^ 
MERGE(sequences  u  and  v  into  t 
Return 

End 

We  now  use  induction  on  n  to  prove 

A{n)    =    "When  a  comment  is  reached  in  sorting  an  n-list,  it  is  true." 

For  n  =  1,  only  the  first  comment  is  reached  and  it  is  clearly  true  since  there  is  only  one  item  in  the 
list.  For  n  >  1,  the  claims  about  u  and  v  are  true  by  A{m)  and  A{n  —  m).  Also,  the  claim  about  t 
is  true  by  the  assumption  that  MERGE  runs  correctly.  Q 

Example  7.13  The  running  time  for  a  merge  sort  How  long  does  the  merge  sort  algorithm 
take  to  run?  Let's  ignore  the  overhead  in  computing  m,  subroutine  calling  and  so  forth  and  focus 
on  the  part  that  takes  the  most  time:  merging. 

Suppose  that  u\  <  . . .  <  Ui  and  v\  <  . . .  <  Vj  are  two  sorted  lists.  We  can  merge  these  two 
ordered  lists  into  one  ordered  list  very  simply  by  moving  pointers  along  the  two  lists  and  comparing 
the  elements  being  pointed  to.  The  smaller  of  the  two  elements  is  output  and  the  pointer  in  its  list 
is  advanced.  (The  decision  as  to  which  to  output  at  a  tie  is  arbitrary.)  When  one  list  is  used  up, 
simply  output  the  remainder  of  the  other  list.  The  sequence  of  operations  for  the  lists  1,3,6  and  2,4,5 
is  shown  in  Figure  7.1.  Since  each  comparison  results  in  at  least  one  output  and  the  last  output  is 
free,  we  require  at  most  i  +  j  —  1  comparisons  to  merge  the  lists  ui  <  . . .  <  Ui  and  vi  <  ...  <  Vj. 
On  the  other  hand,  if  one  list  is  output  before  any  of  the  other  list,  we  might  use  only  min(i,j) 
comparisons. 

Let  C(n)  be  an  upper  bound  on  the  number  of  comparisons  needed  to  merge  sort  a  list  of  n 
things.  Clearly  C(l)  =  0.  The  number  of  comparisons  needed  to  merge  two  lists  with  a  total  of  n 
items  is  at  most  n  —  1.  We  can  convert  our  sorting  procedure  into  one  for  computing  C(n).  All  we 
need  to  do  is  replace  SORT  with  C  and  add  up  the  various  counts.  Here's  the  result. 


/*  t  is  sorted  */ 

/*  u  is  sorted  */ 
/*  V  is  sorted  */ 

/*  t  is  sorted  */ 
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Procedure  C(n) 
C  =  0 

If  (n  =  1)  ,  then  Return  C 

Let  m  be  n/2  with  remainder  discarded 

C  =  C  +  C(m) 

C  =  C  +  C(n-m) 
C  =  C+(n-l) 
Return  C 

End 


To  make  things  easier  for  ourselves,  let's  just  look  at  lengths  which  are  powers  of  two  so  that  the 
division  comes  out  even.  Then  C(l)  =  0  and  0(2*^)  =  2C(2'=~^)  +  2*^  —  1.  By  applying  the  recursion 
we  get 

C(2)  =  2-0  +  1  =  1,  C(4)  =  2-1  +  3  =  5, 

C(8)  =  2-5  +  7  =  17,       C(16)  =  2-17+15  =  49. 

What's  the  pattern? 

*       *       *       Stop  and  think  about  this!        *       *  * 

It  appears  that  0(2*^)  =  [k  —  1)2*^  +  1.  We  leave  it  to  you  to  prove  this  by  induction.  This  suggests 
that  the  number  of  comparisons  needed  to  merge  sort  an  n  long  list  is  bounded  by  about  nlog2(n). 
We've  only  proved  this  for  n  a  power  of  2  and  will  not  give  a  general  proof. 

There  are  a  couple  of  points  to  notice  here.  First,  we  haven't  concerned  ourselves  with  how  the 
algorithm  will  actually  be  implemented.  In  particular,  we've  paid  no  attention  to  how  storage  will 
be  managed.  Such  a  cavalier  attitude  won't  work  with  a  computer  so  we'll  discuss  implementation 
problems  in  the  next  section.  Second,  the  recursive  algorithm  led  naturally  to  a  recursive  estimate 
for  the  speed  of  the  algorithm.  This  is  often  true.  Q 


Local  Descriptions 


We  begin  with  the  local  description  for  two  ideas  we've  seen  before  when  discussing  decision  trees. 
Then  we  look  at  the  "Tower  of  Hanoi"  puzzle,  using  the  local  description  to  illustrate  the  claims  for 
thinking  locally  made  at  the  beginning  of  this  section. 
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L{Sy.    Si  L{S): 

Si,L{S-{si})       ■■■  Sn,L{S-{Sn}) 

Figure  7.2  The  two  cases  for  the  local  description  of  L{S),  the  lex  order  permutation  tree  for 
S  =  {si, . . . ,  Sn}-  Left:  the  initial  case  n  =  1.  Right:  the  recursive  case  n  >  1. 

Example  7.14  The  local  description  of  lex  order  permutations   Suppose  that  5  is  an  n 

element  set  with  elements  si  <  . . .  <  s„.  In  Section  3.1  we  discussed  how  to  create  the  decision 
tree  for  generating  the  permutations  of  S  in  lex  order.  (See  page  70.)  Now  we'll  give  a  recursive 
description  that  follows  the  pattern  in  Definition  7.3  (p.  204). 

Let  L{S)  stand  for  the  decision  tree  whose  leaves  are  labeled  with  the  permutations  of  S  in 

lex  order  and  whose  root  is  labeled  L{S).  If  x  is  some  string  of  symbols,  let  x,L{S)  stand  for  the 
L{S)  with  the  string  of  symbols  "x,"  appended  to  the  front  of  each  label  of  L{s).  For  Case  1  in 
Definition  7.3,  n  =  1.  Then  L{S)  is  simply  one  leaf  labeled  si.  See  Figure  7.2  for  n  >  1,.  What 
we  have  jiist  given  is  called  the  local  description  of  the  lex  order  permutation  tree  because  it  looks 
only  at  what  happens  from  one  step  of  the  inductive  definition  to  the  next.  In  other  words,  a  local 
description  is  nothing  more  that  the  statement  of  Definition  7.3  for  a  specific  problem. 

We'll  use  induction  to  prove  that  this  is  the  correct  tree.  When  n  =  1,  it  is  clear.  Suppose  it  is 
true  for  all  S  with  cardinality  less  than  n.  The  permutations  of  S  in  lex  order  are  those  beginning 
with  si  followed  by  those  beginning  with  S2  and  so  on.  If  Sk  is  removed  from  those  permutations  of 
S  beginning  with  Sk,  what  remains  is  the  permutations  of  5*  —  {sk}  in  lex  order.  By  the  induction 
hypothesis,  these  are  given  by  L{S  —  {sfe}).  Note  that  the  validity  of  our  proof  does  not  depend  on 
how  they  are  given  by  L{S  —  {sfe})-  D 

Example  7.15  Local  description  of  Gray  code  for  subsets  We  studied  Gray  codes  for 
subsets  in  Examples  3.12  (p.  82)  and  3.13  (p.  86).  We  can  give  a  local  description  of  the  algorithm 
as 


G(l)  G(n  +  1) 


where  n  >  0,  R{T)  is  T  with  the  order  of  the  leaves  reversed,  and  IT  is  T  with  1  prepended  to  each 
leaf.  Alternatively,  we  could  describe  two  interrelated  trees,  where  now  R{n)  is  the  tree  for  the  Gray 
code  listed  in  reverse  order: 


G(l)         R{1)  G(n+1)  i?,(r7. +  1) 


The  last  tree  may  appear  incorrect,  but  it  is  not.  When  we  reverse  G{n  +  1),  we  must  move  1  i?(n) 
to  the  left  child  and  reverse  it.  Since  the  reversal  of  R(n)  is  G(n),  this  gives  us  the  left  child  of 
G(n  +  1).  The  right  child  is  explained  similarly.  Q 
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(a)  Starting  (b)  Intermediate  (c)  Illegal! 

Figure  7.3   Three  positions  in  the  Tower  of  Hanoi  puzzle  for  n  =  4. 


}i(n,S,E,G) 


R(.1,S,E,G)  =  S^G 


E(.n-l,S,G,E)       S^G     H(n  -  1,     5,  G) 

Figure  7.4  The  local  description  of  the  solution  to  the  Tower  of  Hanoi  puzzle.  The  left  hand  figure 
describes  the  initial  case  n  =  1  and  the  right  hand  describes  the  recursive  case  n  >  1.  Instead  of  labeling  the 
tree,  we've  identified  the  root  vertex  with  the  label.  This  is  convenient  if  we  expand  the  tree  as  in  the  next 
figure. 


H(n,S,_E,G) 


E(n-1,S,G,E)  S^G  n(n-l,E,S,G) 


y.(n-2,S,E,G)      S"^E    H(n-2,G,S,E)      H(n-2,B,G,S)     e'^-J  G  ]iin-2,S,E,G) 

Figure  7.5  The  first  expansion  of  the  Tower  of  Hanoi  tree  for  n  >  2.  This  was  obtained  by  applying 
Figure  7.4  to  itself. 


Example  7.16  The  Tower  of  Hanoi  puzzle  The  Tower  of  Hanoipuzzle  consists  of  n  different 
sized  washers  (i.e.,  discs  with  holes  in  their  centers)  and  three  poles.  Initially  the  washers  are  stacked 
on  one  pole  as  shown  in  Figure  7.3(a).  The  object  is  to  switch  all  of  the  washers  from  the  left  hand 
pole  to  the  right  hand  pole.  The  center  pole  is  extra,  to  assist  with  the  transfers.  A  legal  move  consists 
of  taking  the  top  washer  from  a  pole  and  placing  on  top  of  the  pile  on  another  pole,  provided  it  is 
not  placed  on  a  smaller  washer. 
How  can  we  solve  the  puzzle? 

To  move  the  largest  washer,  we  must  move  the  other  n  —  1  to  the  spare  peg.  After  moving  the 
largest,  we  can  then  move  the  other  n  —  1  on  top  of  it.  Let  the  washers  be  numbered  1  to  n  from 
smallest  to  largest.  When  we  are  moving  any  of  the  washers  1  through  fc,  we  can  ignore  the  presence 
of  all  larger  washers  beneath  them.  Thus,  moving  washers  1  through  n  —  1  from  one  peg  to  another 
when  washer  n  is  present  uses  the  same  moves  as  moving  them  when  washer  n  is  not  present.  Since 
the  problem  of  moving  washers  1  through  n  —  1  is  simpler,  we  practically  have  a  recursive  description 
of  a  solution.  All  that's  missing  is  the  observation  that  the  simplest  case,  n  =  1,  is  trivial.  The  local 
description  of  the  algorithm  is  shown  in  Figure  7.4  where  X — >Y  indicates  that  washer  k  is  to  be 
moved  from  peg  X  to  peg  Y. 

If  we  want  to  think  globally,  we  need  to  expand  this  tree  until  all  the  H{-  ■  ■)  are  replaced  by 
moves.  In  other  words,  we  continue  until  reaching  H(l,  X,  Y,  ,  which  is  simply  X — >Z.  How  much 
expansion  is  required  to  reach  this  state  depends  on  n.  The  first  step  in  expanding  this  tree  is  shown 
in  Figure  7.5. 
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To  get  the  sequence  of  moves,  expand  the  tree  as  far  as  possible,  extend  the  edges  leading  to 
leaves  so  that  they  are  all  on  the  same  level,  and  then  list  the  leaves  as  they  are  encountered  reading 
from  left  to  right.  When  n  =  3,  we  can  use  Figure  7.5  and  the  left  side  of  Figure  7.4.  The  resulting 
sequence  of  moves  is 

S^G    S^E    G-^E    S^G     E-^S    E^G  S-^G. 

As  you  can  see,  this  is  getting  rather  complex. 

Here's  some  pseudocode  that  implements  the  local  description.  It's  constructed  directly  from 
Figure  7.4.  We  execute  it  by  running  H(n, start, extra, goal). 

Procedure  E(n,S,E,G) 
If  (n  =  1) 

Print:  Move  washer  1  from  5  to  G 
Return 
End  if 

il(in-l,S,G,E) 

Print :  Move  washer  n  from  5  to  G 

nin-l,E,S,G) 

End 

How  many  moves  are  required  to  solve  the  puzzle?  Let  the  number  be  From  the  local 
description  we  have  hi  =  1  and  /i„  =  hn-i  +  1  +  hn-i  =  2/i„_i  +  1  for  n  >  1.  Using  this  one  can 
prove  that  /i„  =  2"  —  1. 

We  can  prove  by  induction  that  the  algorithm  works.  It  clearly  works  for  n  =  1.  Suppose  n  =  1. 
By  induction,  the  left  child  in  the  local  description  (Figure  7.4)  moves  the  n  —  1  smallest  washers 
from  S  to  E.  Thus  the  move  in  the  middle  child  is  valid.  Finally,  the  right  child  moves  the  n  —  1 
smallest  washers  from  E  to  G  (again,  by  induction). 

Wc  can  prove  other  things  as  well.  For  example,  washer  1  moves  on  the  fcth  move  if  and  only 
if  k  is  odd.  Again,  this  is  done  by  induction.  It  is  true  for  n  =  1.  For  n  >  1,  it  true  for  the  left 
child  of  local  description  by  induction.  Similarly,  it  is  true  for  the  right  child  because  the  left  child 
and  the  middle  child  involve  a  total  of  +  1  =  2""^^  moves,  which  is  an  even  number.  If  you 
enjoy  challenges,  you  may  wish  to  pursue  this  further  and  try  to  determine  for  all  k  when  washer  k 
moves  as  well  as  what  each  move  is.  Determining  "when"  is  not  too  difficult.  Determining  "what" 
is  tricky.  Q 


*Computer  Implementation 


Computer  implementation  of  recursive  procedures  involves  the  use  of  stacks  to  store  information  for 
different  levels  of  the  recursion.  A  stack  is  a  list  of  data  in  which  data  is  added  to  the  end  of  the 
list  and  is  also  removed  from  the  same  end.  The  end  of  the  list  is  called  the  top  of  the  stack,  adding 
something  is  called  pushing  and  removing  something  is  called  popping. 
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Example  7.17  Implementing  the  Tower  of  Hanoi  solution  Let's  return  to  the  Tower  of 
Hanoi  procedure  B.{n,  S,  E,G),  which  is  described  in  Figure  7.4.  To  begin,  we  push  n,  S,  E  and  G 
on  the  stack  and  call  the  program  H.  The  stack  entries,  from  the  top,  may  be  referred  to  as  the  first, 
second  and  so  on  items.  If  n  =  1,  H  simply  carries  out  the  action  in  the  left  side  of  Figure  7.4.  If 
n  >  1,  it  carries  out  actions  corresponding  to  each  of  the  three  sons  on  the  right  side  of  Figure  7.4 
in  turn:  The  left  son  causes  it 

•  to  push  n  —  1  and  the  second,  fourth  and  third  items  on  the  stack,  in  that  order, 

•  to  call  the  program  H 

and,  when  H  finishes, 

•  to  pop  four  items  off  the  stack. 

The  middle  son  is  similar  to  n  =  1  and  the  right  son  is  similar  to  the  left  son. 

You  may  find  it  helpful  to  see  what  this  process  leads  to  when  n  =  3.  How  does  it  relate  to 
Figure  7.5?  □ 

Example  7.18  Computing  a  bound  on  comparisons  in  merge  sorting  Let's  look  at  the 
pseudocode  for  computing  C(n),  the  upper  bound  on  the  number  of  comparisons  in  our  merge  sort 
(Example  7.13  (p.  211)).  Here  it  is 

Procedure  C(n) 
C  =  0 

If  (n=l), 

Return  C 
End  if 

Let  m  be  n/2  with  remainder  discarded 
C  =  (7  +  C(m) 
C  =  C  +  C(n-m) 
C  =  C+{n-l) 
Return  C 

End 

This  has  a  new  feature  not  present  in  H:  The  procedure  contains  the  variable  C  which  we  must  save 
if  we  call  C  recursively.  This  can  be  done  as  follows.  When  a  procedure  is  called,  space  is  allocated 
on  (pushed  onto)  the  stack  to  store  all  of  its  "local  variables"  (that  is,  variables  that  exist  only  in 
the  procedure).  When  the  procedure  is  done  the  space  is  deallocated  (popped  off  the  stack).  Thus, 
in  a  programming  language  that  permits  recursion,  each  call  of  a  procedure  Proc  uses  space  for 

•  the  address  in  the  calling  procedure  to  return  to  when  Proc  is  done, 

•  the  values  of  the  variables  passed  to  Proc  and 

•  the  values  of  the  variables  that  are  local  to  Proc 

until  the  procedure  Proc  is  done.  Since  a  recursive  procedure  may  call  itself,  which  calls  itself,  which 
calls  itself,  . . .  ;  it  may  use  a  considerable  amount  of  storage.  Q 
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Example  7.19  Implementing  merge  sorting  Look  at  example  Example  7.13  (p.  211).  It  re- 
quires a  tremendous  amount  of  extra  storage  since  we  need  space  for  the  s,  t,  u  and  v  arrays  every 
time  the  procedure  calls  itself.  If  we  want  to  implement  this  algorithm  on  a  real  computer,  it  will 
have  to  be  rewritten  to  avoid  creating  arrays  recursively.  This  can  be  done  by  placing  the  sorted 
array  in  the  original  array.  Here's  the  new  version 

Procedure  SORT (a[lo]  through  a[hi]) 
If  do  =  hi)  ,  then  Return 

Let  m  be  (lo  +  hi)/2  with  remainder  discarded 
SORT (a[lo]  through  a[m]) 
SORT(a[TO+l]  through  a[hi]) 

MERGE(a[lo]  through  a[m]  with  a[rn  +  1]  through  a[hi]) 

End 

This  requires  much  less  storage.  A  simple  implementation  of  MERGE  requires  a  temporary  array, 
but  multiple  copies  of  that  array  will  not  be  created  through  recursive  calls  because  MERGE  is  not 
recursive.  The  additional  array  problem  can  be  alleviated.  We  won't  discuss  that.  Q 


Exercises 


7.3.1.  In  Example  7.13  we  computed  an  upper  bound  C(n)  on  the  number  of  comparisons  required  in  a 
merge  sort.  The  purpose  of  this  exercise  is  to  compute  a  lower  bound.  Call  this  bound  c(n). 

(a)  Explain  why  merging  two  sorted  lists  of  lengths  ki  and  k2  requires  at  least  min(A;i,  /C2)  compar- 
isons, where  "min"  denotes  minimum.  Give  an  example  of  when  this  is  achieved  for  all  values  of 
ki  and  /c2- 

(b)  Write  code  like  Procedure  C(n)  in  Example  7.13  to  compute  c(n). 

(c)  State  and  prove  a  formula  for  c(n)  when  n  =  2*^,  a  power  of  2.  Compare  c(n)  with  C(n)  when 
n  is  a  large  power  of  2. 

7.3.2.  Give  a  local  description  of  listing  the  strictly  decreasing  functions  from  to  n  in  lex  order.  (These 
are  the  fe-subsets  of  n.)  Call  the  list  D{n,  k)  and  use  the  notation  i,  D{j,  k)  to  mean  the  list  obtained 
by  prepending  i  to  each  of  the  functions  in  D{j,  k)  written  in  one-line  form.  For  example 

D{3,2)  =  (2,1;    3,1;    3,2)       and       5,D(3,2)  =  (5,2,1;    5,3,1;  5,3,2). 


7.3.3.  Merging  two  lists  in  a  single  list  stored  elsewhere  requires  that  each  item  be  moved  once.  Dividing 
a  list  approximately  in  two  requires  no  moves.  State  and  prove  a  formula  for  the  number  of  moves 
required  by  a  merge  sort  of  n  items  when  n  =  2^^ ,  a  power  of  2. 
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7.3.4.  We  have  a  pile  of  n  coins,  all  of  which  arc  identical  except  for  a  single  counterfeit  coin  which  is  lighter 
than  the  other  coins.  We  have  a  "beam  balance,"  a  device  which  compares  two  piles  of  coins  and 
tells  which  pile  is  heavier.  Here  is  a  recursive  algorithm  for  finding  the  counterfeit  coin  in  a  set  of 
n  >  2  coins. 

Procedure  Find(n, Coins) 

If  (n  =  2)  Put  one  coin  in  each,  pile 

and  report  the  result. 

Else 

Select  a  coin  C  in  Coins. 

Find(n—  1, Coins— C) 

If  a  counterfeit  is  reported,  report  it. 

Else  report  C. 

Endif 

Endif 

End 

Since  Find  only  uses  the  beam  balance  if  n  =  2,  this  recursive  algorithm  finds  the  counterfeit  coin 
by  using  the  beam  balance  only  once  regardless  of  the  value  of  n  >  2.  What  is  wrong?  How  can  it 
be  corrected? 

7.3.5.  Suppose  we  have  a  way  to  print  out  the  characters  0-9  and  — ,  but  do  not  have  a  way  to  print  out 
integers  such  as  —360.  Wo  want  a  procedure  DUT(m)  to  print  out  integers  ni,  both  positive,  negative, 
and  zero,  as  strings  of  digits.  If  n  >  0  is  a  positive  integer,  let  q  and  r  be  the  quotient  and  remainder 
when  n  is  divided  by  10. 

(a)  Using  the  fact  the  digits  of  n  are  the  digits  of  q  followed  by  the  digit  r  to  write  a  recursive 
procedure  OUT(m)  that  prints  out  m  for  any  integer  (positive,  negative,  or  zero). 

(b)  What  are  the  simplest  objects  in  your  recursive  solution? 

(c)  Explain  why  your  procedure  never  runs  forever. 

7.3.6.  Let  n  >  0  be  an  integer  and  let  q  and  r  be  the  quotient  and  remainder  when  n  is  divided  by  10.  We 
want  a  procedure  DSUM(n)  to  sum  the  digits  of  n. 

(a)  Using  the  fact  that  the  sum  of  the  digits  of  n  equals  r  plus  the  sum  of  the  digits  of  q,  write  a 

recursive  procedure  DSUM(n). 

(b)  What  are  the  simplest  objects  in  your  recursive  solution? 

(c)  Explain  why  your  procedure  never  runs  forever. 

7.3.7.  What  is  the  local  description  for  the  tree  that  generates  the  decreasing  functions  in  n-?  DecreEising 
functions  were  discussed  in  Example  3.8. 

7.3.8.  Expand  the  local  description  of  the  Tower  of  Hanoi  to  the  full  tree  for  n  =  2  and  for  n  =  4.  Using 
the  expanded  trees,  write  down  the  sequence  of  moves  for  n  =  2  and  for  n  =  4. 

7.3.9.  Let  S{n)  be  the  number  of  moves  required  to  solve  the  Tower  of  Hanoi  puzzle. 

(a)  Prove  by  induction  that  Procedure  H  takes  the  least  rmmbcr  of  moves. 

(b)  Convert  Procedure  H  into  a  procedure  that  computes  S{n)  recursively  as  was  done  for  sorting 
in  Example  7.13.  Translate  the  code  you  have  just  written  into  a  recursion  for  S{n). 

(c)  Construct  a  table  of  S{n)  for  1  <  n  <  7. 

(d)  Find  a  simple  formula  {not  a  recursion)  for  S{n)  and  prove  that  it  is  correct  by  using  the  result 
in  (b)  and  induction. 

(e)  Assuming  the  starting  move  is  called  move  one,  what  washer  is  moved  on  move  fc? 
Hint.  There  is  a  simple  description  in  terms  of  the  binary  representation  of  k. 

*(f)  What  are  the  source  and  destination  poles  on  move  fc? 
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7.3.10.  Wo  have  discovered  a  simpler  procedure  for  the  Tower  of  Hanoi:  it  only  involves  one  recursive  call. 
To  move  washers  fc  to  n  we  use  H(fc,  n,  S,  E,  G).  Here's  the  procedure. 

Procedure  B.{k,n,S,E,G) 
If  (fc  =  n) 

Move   washer  n  from  S  to  G 
Return 
End  if 

Move   washer  k  from  S  to  E 

H(fc  +  l,n,  S,E,G) 

Move   washer  k  from  E  to  G 

Return 

End 


To  get  the  solution,  run  H(l,  n,  S,  E,  G).  This  is  an  incorrect  solution  to  the  Tower  of  Hanoi  problem. 
Which  of  the  two  conditions  for  a  recursive  solution  fails  and  how  does  it  fail?  Why  does  the  algorithm 

in  the  text  not  fail  in  the  same  way? 

7.3.11.  We  consider  a  modification  of  the  Tower  of  Hanoi.  All  the  old  rules  apply,  but  the  moves  are  more 

limited:  You  can  think  of  the  poles  as  being  in  a  row  with  the  extra  pole  in  the  middle.  The  new  rule 
then  says  that  a  washer  can  only  move  to  an  adjacent  pole.  In  other  words,  a  washer  can  never  be 
moved  directly  from  the  original  starting  pole  to  the  original  destination  pole.  Thus,  when  n  =  1  we 
require  two  moves:  S — >E  and  E — >G. 

Let  H*{n,Pi,P2,P3)  be  the  tree  that  moves  n  washers  from  Pi  to  P3  while  using  P2  as  the 
extra  pole.  The  middle  pole  is  P2- 

(a)  At  the  start  of  the  problem,  we  described  the  moves  for  n  =  1.  For  n  >  1,  washer  n  must  first 

move  to  the  extra  post  and  then  to  the  goal.  The  other  n  —  1  washers  must  first  be  stacked 
on  the  goal  and  then  on  the  start  to  allow  these  moves.  Draw  the  local  description  of  H*  for 
n  >  1. 

(b)  Let  hn  be  the  number  of  washers  moved  by  H* {n,  S,  E,G).  Write  down  a  recursion  for  h^, 
including  initial  conditions. 

(c)  Compute  the  first  few  values  of  hn,  guess  the  general  solution,  and  prove  it. 

7.3.12.  The  number  of  partitions  of  the  set  n  into  k  blocks  was  defined  in  Example  1.27  to  be  S{n,k),  the 
Stirling  numbers  of  the  second  kind.  We  developed  the  recursion  S{n,  k)  =  S(n— 1,  fc— l)+feS(n— 1,  k) 
by  considering  whether  n  was  in  a  block  by  itself  or  in  one  of  the  k  blocks  of  S{n  —  l,k).  By  using 
the  actual  partitions  instead  of  just  counting  them,  we  can  interpret  the  recursion  as  a  means  of 
producing  all  partitions  of  n  with  k  blocks. 

(a)  Write  pseudocode  to  do  this. 

(b)  Draw  the  local  description  for  the  algorithm. 
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7.3.13.  We  want  to  produce  all  sequences  a  =  ai, . .  .  ,  an  where  1  <  a,;  <  ki.  This  is  to  be  done  so  that  if  B  is 
produced  ininiediately  after  a,  then  all  but  one  of  the  entries  in  p  is  the  same  as  in  a  and  that  entry 
differs  from  the  a  entry  by  one.  Such  a  list  of  sequences  is  called  a  Gray  code.  If  T  is  a  tree  whose 
leaves  are  labeled  by  sequences,  let  o,  T  be  the  same  tree  with  each  leaf  label  a  replaced  by  a,  a.  Let 
R{T)  be  the  tree  obtained  by  taking  the  mirror  image  of  T.  (The  sequences  labeling  the  leaves  are 
moved  but  are  not  reversed.)  For  example,  if  the  leaves  of  T  are  labeled,  from  left  to  right, 

1.2  1,3       2,4       1,4  2,3, 
then  the  leaves  of  R{T)  are  labeled,  from  left  to  right, 

2.3  1,4       2,4       1,3  1,2. 

Let  G{ki, . . . ,  kn)  be  the  decision  tree  with  the  local  description  shown  below.  Here  n  >  1,  H  = 
G{k2, . . . ,  kn)  and  T  is  either  H  or  R{H)  according  as  fci  is  odd  or  even. 


G{k)  G{kr,...,kn) 


1    2   •••     k  1,H    2,R{H)    3,H    •••  ki,T 


(a)  Draw  the  full  tree  for  G(3,  2,  3)  and  the  full  tree  for  G(2,  3,  3). 

(b)  Prove  that  G{ki, . . .  ,kn)  contains  all  sequences  ai, . . . ,  On  where  1  <  flj  <  fcj. 

(c)  Prove  that  adjacent  leaves  of  G{ki, . . . ,  kn)  differ  in  exactly  one  entry  and  that  entry  changes 
by  one  from  one  leaf  to  the  next. 

*(d)  Suppose  that  fei  =  •  •  •  =  A;„  =  2.  Describe  RANK(a). 

*(e)  Suppose  that  fci  =  ■  ■  •  =  fcn  =  2.  Tell  how  to  find  the  sequence  that  follows  a  without  using 
RANK  and  UNRANK. 

7.3.14.  For  each  of  the  previous  exercises  that  requested  pseudocode,  tell  what  is  placed  on  the  stack  as  a 
result  of  the  recursive  call. 
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In  its  narrowest  sense,  divide  and  conquer  refers  to  the  division  of  a  problem  into  a  few  smaller 
problems  that  are  of  the  same  kind  as  the  original  problem  and  so  can  be  handled  by  a  recursive 
method.  We've  seen  binary  insertion,  Quicksort  and  merge  sorting  as  examples  of  this.  In  a  broader 
sense,  divide  and  conquer  refers  to  any  method  that  divides  a  problem  into  a  few  simpler  problems. 
Heapsort  illustrates  this  broader  definition. 

The  broad  divide  and  conquer  technique  is  important  for  dealing  with  most  complex  situa- 
tions. Delegation  of  responsibility  is  an  application  in  everyday  life.  Scientific  investigation  often 
employs  divide  and  conquer.  In  computer  science  it  appears  in  both  the  design  and  implementation 
of  algorithms,  where  it  is  referred  to  by  such  terms  as  "top-down  programming,"  "structured  pro- 
gramming," "object  oriented  programming"  and  "modularity."  Properly  used,  these  are  techniques 
for  efficiently  creating  and  implementing  understandable,  correct  and  flexible  programs. 

What  tools  are  available  for  applying  divide  and  conquer  to  smaller  problems?  For  example,  how 
might  one  discover  the  algorithms  we've  discussed  in  this  text?  An  algorithm  that  has  a  nonrecursive 
nature  doesn't  seem  to  fit  any  general  rules;  for  example,  all  we  can  recommend  for  discovering 
something  like  Heapsort  is  inspiration.  You  can  cultivate  inspiration  by  being  familiar  with  a  variety 
of  ideas  and  by  trying  to  look  at  problems  in  novel  ways. 

We  can  say  a  bit  more  about  trying  to  discover  recursive  algorithms;  that  is,  algorithms  that  we 
use  divide  and  conquer  in  its  narrowest  sense.  Suppose  the  data  is  such  that  it  is  possible  to  split 
it  into  a  few  large  blocks.  You  can  ask  yourself  if  anything  is  accomplished  by  solving  the  original 
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problem  on  the  separate  blocks  and  then  exploiting  it.  We'll  see  how  this  works  by  first  reviewing 
earlier  material  and  then  move  on  to  some  new  problems. 

Example  7.20  Locating  items  in  lists  If  we  are  trying  to  find  an  item  in  an  ordered  list,  we 
can  divide  the  list  in  half.  What  does  this  accomplish? 

Suppose  the  item  actually  lies  in  the  first  half.  When  we  look  at  the  start  of  the  second  half 
list,  we  can  immediately  solve  the  problem  for  that  sublist:  the  item  we're  looking  for  is  not  in  that 
sublist  because  it  precedes  the  first  item.  Thus  we've  divided  the  original  problem  into  two  problems, 
one  of  which  is  trivial.  Binary  insertion  exploits  this  observation.  Analysis  of  the  resulting  algorithm 
shows  that  it  is  considerably  faster  than  simply  looking  through  the  list  item  by  item.  Q 

Example  7.21    Sorting    If  we  are  trying  to  sort  a  list,  we  could  divide  it  into  two  parts  and  sort 

each  part  separately.  What  does  this  accomplish?  That  depends  on  how  we  divided  the  list. 

Suppose  that  we  divided  it  in  half  arbitrarily.  If  each  half  is  sorted,  then  we  must  merge  two 
sorted  lists.  Some  thought  reveals  that  this  is  a  fairly  easy  process.  Exploiting  this  idea  leads  to 
merge  sorting.  Analysis  of  the  algorithm  shows  that  it  is  fast. 

Suppose  that  we  can  arrange  the  division  so  that  all  the  items  in  the  first  part  should  precede  all 
the  items  in  the  other  part.  When  the  two  parts  are  sorted,  the  list  will  be  sorted.  How  can  we  divide 
the  list  this  way?  A  bit  of  thought  and  a  clever  idea  may  lead  to  the  method  used  by  Quicksort. 
Analysis  of  the  algorithm  shows  that  it  is  usually  fast.  Q 

Example  7.22  Calculating  powers  Suppose  that  we  want  to  calculate  .x"  when  n  is  a  large 
positive  integer.  A  simple  way  to  do  this  is  to  multiply  x  by  itself  the  appropriate  number  of  times. 
This  requires  n  —  1  multiplications. 

We  can  do  better  with  divide  and  conquer.  Suppose  that  n  =  m.k.  We  can  compute  y  =  .x™ 
and  then  compute  x"  by  noting  that  it  equals  y'^.  Using  the  method  in  the  previous  paragraph  to 
compute  y  and  then  to  compute  y'^  means  that  we  require  only  (m  —  1)  +  (fc  —  1)  =  m  +  k  —  2 
multiplications.  This  is  much  less  than  n  —  1  =  mk  —  1.  We'll  call  this  the  "factoring  method." 

As  is  usual  with  divide  and  conquer,  recursive  application  of  the  idea  is  even  better. 

In  other  words,  we  regard  the  computation  of  x™'  and  y*^  as  new  problems  and  solve  them  by  first 
factoring  m  and  k  and  so  forth.  For  example,  computing       requires  only  t  multiplications. 

There  is  a  serious  drawback  with  the  factoring  method:  n  may  not  have  many  factors;  in  fact, 
it  might  even  be  a  prime.  What  can  we  do  about  this? 

If  n  >  3  is  a  prime,  then  n  —  1  is  not  a  prime  since  it  is  even.  Thus  we  can  use  the  factoring 
method  to  compute  and  then  multiply  it  by  x.  We  still  have  to  deal  with  the  factorization 
problem.  This  is  getting  complicated,  so  perhaps  we  should  look  for  a  simpler  method.  Try  to  think 
of  something. 

*       *       *       Stop  and  think  about  this!        *       *  * 

The  request  that  you  try  to  think  of  something  was  quite  vague,  so  it  is  quite  likely  that  different 
people  would  have  come  up  with  different  ideas.  Here's  a  fairly  simple  method  that  is  bascid  on  the 
observations  =  (a;™)^  and  x^™"*"^  =  {x"'-)^x  applied  recursively.  Let  the  binary  representation 
of  n  be  bkbk-i  ■  ■  -  bo;  that  is 

k 

n  =  ^bi2'  =  (•••((6fe)2  +  6fe_i)2  + •••61)2  +  60.  7.10 

i=0 

It  follows  that 

a;"  =  ( ■  ■  •  ( (.x'''^^  fJ"'-'f---x^'^''x^\ 

where  x'''  is  either  x  or  1  and  so  requires  no  multiplications  to  compute.  Since  multiplication  by  1  is 
trivial,  the  total  number  of  multiplications  is  k  (from  squarings)  plus  the  number  of  60  through  6fc_i 
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which  are  nonzero.  Thus  the  number  of  muhiphcations  is  between  k  and  2k.  Since  2'^+'^  >  n  >  2^ 
by  (7.10),  this  method  always  requires  O(lnn)  multiphcations.^  In  contrast,  our  previous  methods 
always  required  at  least  ©(Inn)  multiplications  and  sometimes  required  as  many  as  ©(n). 

Does  our  latest  method  require  the;  lc;ast  number  of  multiplications?  Not  always.  TIktc  is  no 
known  good  way  to  calculate  the  minimum  number  of  multiplications  required  to  compute  x".  Q 

Example  7.23  Finding  a  maximum  subsequence  sum  Suppose  we  are  given  a  sequence  of 
n  arbitrary  real  numbers  ai,  02, . . . ,  fln-  We  want  to  find  i  and  j  >i  such  that  X^^^j  flfe  is  a  large  as 
possible. 

Here's  a  simple  approach:  For  each  I  <  i  <  j  <  n  compute  Aij  =  Ylik=i  '  t^i^n  find  the 
maximum  of  these  numbers.  Since  it  takes  j  —  i  additions  to  compute  Ai^j ,  the  number  of  additions 
required  is 

n  j 

which  turns  out  to  be  approximately  The  simple  approach  to  find  the  maximum  of  the 

approximately         numbers  Aij  requires  about         comparisons.  Thus  the  total  work  is  ©(n^). 

Can  we  do  better?  Yes,  there  are  ways  to  compute  the  Aij  in  ©(n^).  The  work  will  be  ©(n^). 

Can  we  do  better?  Yes,  there  is  a  divide  and  conquer  approach.  The  idea  is 

•  split  ai, . . . ,  o„  into  two  sequences  ai, . . . ,     and  Ofc+i, . . . ,  a„,  where  k  «  n/2, 

•  compute  the  information  for  the  two  halves  recursively, 

•  put  the  two  halves  together  to  get  the  information  for  the  original  sequence  ai, . . . ,  a„. 

There  is  a  problem  with  this.  Consider  the  sequence  3,-4,2,2,-4,3  the  two  half  sequences  each 
have  a  maximum  of  3,  but  the  maximum  of  the  entire  sequence  is  2  +  2  =  4.  The  problem  arises 
because  the  maximum  sum  is  split  between  the  two  half-sequences  3,  —4,  2  and  2,  —4,  3.  We  get 
around  this  by  keeping  track  of  more  information.  In  fact  we  keep  track  of  the  maximum  sum,  the 
maximum  sum  that  starts  at  the  left  end  of  the  sequence,  the  maximum  sum  that  ends  at  the  right 
end  of  the  sequence,  and  the  total  sum.  Here's  an  algorithm. 

Procedure  MaxSum(M,  L,  R,  T,  (ai, . . . ,  a„)) 
If  n  =  1 

Set  M  =  L  =  R  =  T  =  ai 

Else 

k  =  [n/2\ 

MaxSumCM^,  Le,  Rel,  Te,  (ai, . . . ,  Ofe)) 

MaxSum(Af,.,  Lr,  Rr,  Tr,  (flfe+i,  •  •  • ,  ci„)) 

M  =  ma.x(Mi,Mr,Ri  +  Lr) 

L  =  maxCL^,     +  Lr) 

R  =  max(^Rr,Tr  +  Re) 

T  =  Te  +  Tr 
End  if 
Return 

End 

^  The  notation  ©  indicates  same  rate  of  growth  to  within  a  constant  factor.  For  more  details,  see 
page  368. 
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Why  does  this  work? 

•  It  should  be  clear  that  the  calculation  of  T  is  correct. 

•  You  should  be  able  to  see  that  M,  the  maximum  sum,  is  either  the  maximum  in  the  left  half 
{Me),  the  maximum  in  the  right  half  (M^),  or  a  combination  from  each  half  {Re  +  Lr).  This  is 
just  what  the  procedure  computes. 

•  L,  the  maximum  that  starts  on  the  left,  either  ends  in  the  left  half  {Le)  or  extends  into  the  right 
half  {Ti  +  Lr),  which  is  what  the  procedure  computes. 

•  The  reasoning  for  R  is  the  mirror  image  of  that  for  L. 

How  long  does  this  algorithm  take  to  run?  Ignoring  the  recursive  part,  there  is  a  constant  amount 
of  work  in  the  code,  with  one  constant  for  n  =  1  and  another  for  n  >  1.  Hence 

(total  time)    =    0(1)  x  (number  of  calls  of  MaxSum). 

Every  call  of  MaxSum  when  n  >  1  divides  the  sequence.  We  must  insert  n  —  1  divisions  between  the 
elements  of  ai, . . . ,  a„  to  got  sequences  of  length  1.  Hence  there  are  n  —  1  calls  of  this  type.  MaxSum 
also  calls  itself  for  each  of  the  n  elements  of  the  sequence.  Thus  there  are  a  total  of  2n  —  1  calls  and 
so  the  running  time  is  Q{n).  □ 

There  is  an  important  principle  that  surfaced  in  the  previous  example  which  didn't  arise  in  our 
simpler  examples  of  finding  a  recursive  algorithm: 

Principle  In  order  to  find  a  recursive  algorithm  for  a  problem,  it  may  be  helpful,  even  necessary, 
to  ask  for  more — either  a  stronger  result  or  the  calculation  of  more  information. 

In  the  example,  we  introduced  new  variables  in  our  algorithm  to  keep  track  of  other  sums.  Without 

such  variables,  the  algorithm  would  not  have  worked.  As  we  remarked  in  Section  7.1,  this  principle 
also  applies  to  inductive  proofs.  We  have  seen  some  examples  of  this: 

•  When  we  proved  the  existence  of  lineal  spanning  trees  in  Theorem  6.2  (p.  153),  it  was  necessary 
to  prove  that  we  could  find  one  with  any  vertex  of  the  graph  as  the  root  of  the  tree. 

•  When  we  studied  Ramsey  problems  in  Example  7.4  (p.  201),  we  had  to  replace  N{k)  with  the 
more  general  N{ki,k2)  in  order  to  carry  out  our  inductive  proof. 


Exercises 


7.4.1.  What  is  the  least  number  of  multiplications  you  can  find  to  compute  each  of  the  following:  x^^,  y^^ , 
z^"^  and  w'^^ . 
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7.4.2.  This  problem  concerns  the  Fibonacci  numbers.  They  satisfy  the  recursion  Fn  =  -FVi-l  +  -Fri-2  for 
n  >  2.  People  use  various  values  for  Fq  and  Fi.  We  will  use  Fq  =  0  and  Fi  =  1.  (Elsewhere  in  the 
text  we  may  find  it  convenient  to  choose  different  initial  values.) 

(a)  Compute  Fn  for  n  <  7. 

(b)  Recall  that      means  the  transpose  of  the  matrix  A.  Let  Vn  be  the  column  vector  (F^, 
Show  that  Vn+i  =  Mvn  where  M  =  (  ^    |^  ) .  Conclude  that  Vn  =  M'^vq.  Suggest  a  rapid 


method  for  calculating  Fn- 
(c)  Show  that 


Fn-l  Fn 
Fn  Fn+l 


(d)  Use  =    (M")2  to  prove  that  F2n    =    F„(F„+i  +  =    F^+-^  -  F^_-^  and 

F2n+1  =  F^_^_l+Fn. 

7.4.3.  Suppose  that  an  =  cia„_i  +  •  •  •  +  cj^an-k  for  n  >  k.  Extend  the  idea  in  the  previous  exercise  to 
describe  how  to  rapidly  compute  an  for  a  large  value  of  n. 

7.4.4.  In  this  problem  you  are  given  a  bag  of  n  coins.  All  the  coins  have  the  same  weight,  except  for  one, 
which  is  counterfeit.  You  are  given  a  balance  type  scales.  This  means  that  given  two  piles  of  coins, 
you  can  use  the  scales  to  determine  whether  the  piles  have  the  same  weight  or,  if  they  differ,  which 
is  heavier.  The  goal  is  to  devise  a  strategy  for  locating  the  counterfeit  coin  with  the  least  possible 
number  of  weighings.  (See  Exercise  7.3.4  (p.  218).) 

(a)  Given  that  the  counterfeit  coin  is  lighter,  devise  a  recursive  divide  and  conquer  strategy  for 
finding  it. 

Hint.  If  you  strategy  is  a  good  one,  it  will  find  the  coin  after  k  weighings  when  n  =  3*^ . 

(b)  Modify  the  proof  of  the  lower  bound  for  the  number  of  comparisons  needed  for  sorting  to  show 
that  the  average  number  of  weighings  needed  in  any  algorithm  is  at  least  log3  n. 

(c)  Devise  an  algorithm  when  it  is  not  known  whether  the  counterfeit  coin  is  lighter  or  heav- 
ier. 

(d)  Suppose  that  there  arc  two  counterfeit  coins,  both  of  the  same  weight  and  both  lighter  than  the 
real  coins.  Devise  a  strategy. 

*7.4.5.  A  full  binary  RP-tree  is  a  rooted  plane  tree  in  which  each  vertex  is  either  a  leaf  or  has  exactly  two 
children.  An  example  is  a  decision  tree  in  which  each  decision  is  either  "yes"  or  "no". 

Suppose  that  we  axe  given  a  full  binary  RP-tree  T  with  root  r  and  a  function  /  from  vertices  to 
the  real  numbers.  If  u  is  a  vertex  of  T,  let  F{v)  be  the  sum  of  f{x)  over  all  vertices  x  in  the  subtree 
rooted  at  v.  We  want  to  find 

F(T)  =  maxF(u). 

veT 

One  way  to  do  this  is  compute  F{v)  for  each  vertex  v  €  T  and  then  find  the  maximum.  This  takes 
G  (n  In  n)  work,  where  n  is  the  number  of  vertices  in  T.  Find  a  better  method. 

*7.4.6.  If  you  found  Example  7.23  easy,  here's  a  challenge:  Extend  the  problem  to  a  doubly  indexed  array 
afe^„.  In  this  case  we  want  /'i,  12,  ji  and  j2  so  that  X^feLii  X]m=i2  '^k,Tn.  is  as  large  as  possible.  We 
warn  you  that  you  must  keep  track  of  quite  a  bit  more  information. 

7.4.7.  Here's  another  approach  to  the  identities  of  Exercise  7.4.2.  Let  On  =  On-i  +  (i'n-2,  where  oq  and  01 
are  given. 

(a)  Prove  that  an  =  ooF„_i  -|-  aiFn  for  n  >  0. 

(b)  Show  that,  if  oq  =  F^  and  01  =  Fj._|_i,  then  an  =  F„_|_/j.  Conclude  that  F„_|_/j  =  Fj.F„_i  -|- 

Fk+lFn-2- 
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Notes  and  References 


Most  introductory  combinatorics  texts  discuss  induction,  but  discussions  of  recursive  algorithms  are 
harder  to  find.  The  subject  of  recursion  in  treated  beautifully  by  Roberts  [4].  Williamson  [6;  Ch.6] 
discusses  the  basic  ideas  behind  recursive  algorithms  with  applications  to  graphs. 

Further  discussion  of  Ramsey's  Theorem  as  well  as  some  of  its  consequences  appears  in  Cohen's 
text  [2;  Ch.  5].  A  more  advanced  treatment  of  the  theorem  and  its  generalizations  has  been  given  in 
the  monograph  by  Graham,  Rothschild  and  Spencer  [3] . 

People  have  studied  the  Tower  of  Hanoi  puzzle  with  the  same  rules  but  with  more  than  one 
extra  pole  for  a  total  of  fc  >  3  poles.  The  optimal  strategy  is  not  known;  however,  it  is  known  that 
if  Hn{k)  is  the  least  number  of  moves  required,  then  log2  Hn{k)     {n{k  —  2)!)^/('^^^)  [1]. 

Divide  and  conquer  methods  are  discussed  in  books  that  teach  algorithm  design;  however,  little 
has  been  written  on  general  design  strategies.  Some  of  our  discussion  is  based  on  the  article  by 
Smith  [5]. 
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CHAPTER  8 

Sorting  Theory 


Introduction 


The  problem  of  developing  and  implementing  good  sorting  algorithms  has  been  extensively  studied. 
If  you've  taken  a  programming  course,  you  have  probably  seen  code  for  specific  sorting  algorithms. 
You  may  have  programmed  various  sorting  algorithms.  Our  focus  will  be  different,  emphasizing  the 
general  framework  over  the  specific  implementations.  We'll  also  look  at  "sorting  networks"  which 
are  a  type  of  hardware  implementation  of  certain  sorting  algorithms. 

In  the  last  section,  we'll  explore  the  "divide  and  conquer"  technique.  The  major  aspect  of  this 
technique  is  the  recursive  approach  to  problems. 

Before  discussing  sorting  methods,  we'll  need  a  general  framework  for  thinking  about  the  subject. 
Thus  we'll  look  briefly  at  how  sorting  algorithms  can  be  classified.  All  of  them  involve  making 
comparisons,  require  some  sort  of  storage  medium  for  the  list  and  must  be  physically  implemented 
in  some  fashion.  We  can  partially  classify  algorithms  according  to  how  these  things  are  handled. 

•  Type  of  comparison: 

•  Relative:  An  item  is  compared  to  another  item.  The  result  simply  says  which  of  the  two  is 
larger.  Almost  all  sorts  are  of  this  type. 

•  Absolute:  An  item  is  ranked  using  an  absolute  standard.  The  result  says  which  of  several 
items  it  equals  (or  lies  between).  These  are  the  bucket  sorts. 

•  Data  access  needed: 

•  Random  Access:   These  types  of  sorts  need  to  be  able  to  look  at  almost  any  item  in  the 

(partially  sorted)  list  at  almost  any  time. 

•  Sequential  Access:  These  types  of  sorts  make  a  rather  limited  number  of  sequential  passes 
through  the  data.  Consequently,  the  data  could  be  stored  on  magnetic  tape}  These  are 
usually  merge  sorts. 

•  Implementation  method: 

•  Software:  Most  sorts  arc  implemented  as  a  program  on  a  general  purpose  computer. 

^  Magnetic  tape  is  a  medium  that  was  used  before  large  disc  drives  were  available  at  reasonable 
prices.  The  data  on  the  tape  can  be  thought  of  as  one  long  list.  It  was  very  time  consuming  to  move 
to  a  position  in  the  list  that  was  far  away  from  the  current  position.  Hence  it  was  desirable  to  read 
the  list  without  skipping  around.  This  is  the  origin  of  the  term  sequential  access.  One  tape  provided 
a  single  sequential  access  medium.  Two  tapes  provided  two  sequential  access  media,  and  so  forth. 
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•  Hardware:  Some  sorts  are  implemented  by  hardware.  Depending  on  how  versatile  the  hard- 
ware is,  it  could  be  close  to  a  software  implementation.  The  least  versatile  (and  also  most 
common)  hardware  implementations  are  (a)  the  card  sorters  of  a  largely  bygone  era  which 
are  used  to  bucket  sort  punched  cards  and  (b)  sorting  networks,  which  we'll  study  in  Sec- 
tion 8.3. 


8.1   Limits  on  Speed 


Suppose  someone  comes  to  you  with  two  sorting  algorithms  and  asks  you  which  is  faster.  How  do 
you  answer  him?  Unless  he  has  actual  code  for  a  particular  machine,  you  couldn't  obtain  actual 
times.  However,  since  comparisons  are  the  heart  of  sorting,  we  could  ask:  "How  many  comparisons 
does  this  algorithm  make  in  the  process  of  sorting?"  We  could  then  suggest  that  the  algorithm  that 
required  less  comparisons  was  the  faster  of  the  two.  There  are  some  factors  that  make  this  only  a 
rough  estimate: 

•  We  did  not  include  relocation  overhead — somehow  items  must  be  repositioned  to  obtain  the 
sorted  list. 

•  We  did  not  include  miscellaneous  overhead  such  as  initialization  and  subroutine  calls. 

We  will  ignore  such  problems  and  just  look  at  the  number  of  comparisons.  Even  so,  there  are 
problems: 

•  Using  the  parallel  processing  capabilities  of  supercomputers  or  special  purpose  devices  will  throw 
time  estimates  off  because  more  than  one  comparison  can  be  done  at  a  time.  The  amount  of 
parallelism  that  is  possible  can  vary  from  algorithm  to  algorithm. 

•  The  number  of  comparisons  needed  may  vary  greatly,  depending  on  the  order  of  the  items  in 
the  unsorted  list. 

We'll  ignore  these  factors  in  the  discussion,  except  for  parallelism  in  sorting  networks,  where  it  is  of 
major  importance. 

Besides  all  these  problems  with  estimating  running  time,  there  is  another  problem:  Running 
time  is  not  the  only  standard  that  can  be  used  to  decide  how  good  an  algorithm  is.  Other  important 
questions  include 

•  How  long  will  it  take  to  get  an  error  free  program  running? 

•  How  much  storage  space  will  the  algorithm  require? 

We'll  ignore  these  issues  and  focus  on  running  time. 

Let  C{n)  be  the  number  of  comparisons  that  an  algorithm  requires  to  sort  n  items.  "Foul!"  cries 
a  careful  reader.  "You  pointed  out  that  the  time  a  sort  takes  depends  on  the  original  order  of  the 
list  and  now  you're  talking  about  this  number  C(n)  as  if  that  weren't  the  case."  True.  We  should 
specify  C{n)  more  carefully.  It's  reasonable  to  consider  two  measures  of  speed: 

•  Worst  case:  C(n)  =  WC(n),  the  greatest  number  of  comparisons  the  algorithm  requires  to  sort 
n  items.  This  is  important  where  the  results  of  a  sort  are  needed  quickly. 

•  Average:  C(n)  =  AC(n),  the  average  number  of  comparisons  the  algorithm  requires  to  sort  n 
items.  This  is  important  when  we  are  doing  many  sorts  and  want  to  minimize  overall  computer 
usage. 

The  average  referred  to  here  is  the  average  over  all  n!  possible  orderings  of  the  list.  Obviously 
WC(n)  >  AC(n).  Our  goal  in  this  section  is  to  motivate  and  prove 
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Theorem  8.1  Lower  Bound  on  Comparisons  AC(n),  the  average  number  of  com- 
parisons required  by  any  sorting  algorithm  that  correctly  sorts  all  possible  lists  of  n  items  by 
comparing  pairs  of  elements  is  at  least  log2(n!). 

By  Stirling's  formula  (Theorem  1.5  (p.  12)),  this  bound  is  close  to  nlogj  n  when  n  is  large.  In  view 
of  this,  a  sorting  algorithm  with  running  time  9(nlnn)  is  usually  considered  to  be  a  reasonably  fast 
algorithm.  Most  commonly  used  sorting  algorithms  are  reasonably  fast.  We'll  look  at  an  old  friend 
now. 

Example  8.1   Merge  sorting  is  reasonably  fast  In  Example  7.13  (p.  211)  we  saw  that  a  simple 

merge  sort  takes  at  most  about  nloggn  comparisons  when  n  is  a  power  of  2.  (Actually,  this  result 
is  true  for  all  large  n.)  Thus,  a  merge  sort  is  reasonably  fast.  As  indicated  in  Example  7.13,  there 
are  some  storage  space  problems  with  merge  sorting.  Q 

Motivation  and  Proof  of  the  Theorem 


Our  proof  of  Theorem  8.1  will  be  by  induction.  Induction  requires  knowing  the  result  (in  this  case 
AC(n)  >  log2(n!))  beforehand.  How  would  one  ever  come  up  with  the  result  beforehand?  A  result 
like  the  Four  Color  Theorem  (p.  158)  might  be  conjectured  after  some  experimentation,  but  one 
is  imlikcly  to  stumble  upon  Theorem  8.1  experimentally.  Wc  will  "discover"  it  by  relating  sorting 
algorithms  to  decision  trees,  which  lead  easily  to  the  inequality  WC(n)  >  log2(n!).  One  might 
then  test  AC(n)  >  log2(n!)  for  small  values  of  n,  thus  motivating  Theorem  8.1.  Someone  versed 
in  information  theory  might  motivate  the  theorem  as  follows:  "A  comparison  gives  us  one  bit  of 
information.  Given  k  bits  of  information,  we  can  distinguish  among  at  most  2*^  different  things.  Since 
we  must  distinguish  among  n!  different  arrangements,  we  require  that  2*^  >  n!  and  so  k  >  log2(n!)." 
(This  is  motivation,  not  a  proof — it's  not  even  clear  if  it's  referring  to  worst  case  or  average  case 
behavior.)  Let's  get  on  with  things. 

Suppose  we  are  given  a  comparison  based  sorting  algorithm.  Since  it  is  assumed  to  correctly  sort 
n-lists,  it  must  correctly  sort  lists  in  which  all  items  are  different.  By  simply  renaming  our  items 
1,  2,  . . .,  n,  we  can  suppose  that  we  are  sorting  lists  which  are  permutations  of  n. 

Our  proof  will  make  use  of  the  decision  tree  associated  with  this  sorting  algorithm.  Wc  construct 
the  tree  as  follows.  Whenever  we  make  a  comparison  in  the  course  of  the  algorithm,  our  subsequent 
action  depends  on  whether  an  inequality  holds  or  fails  to  hold.  Thus  there  are  two  possible  decisions 
at  each  comparison  and  so  each  vertex  in  our  decision  tree  has  at  most  two  sons. 

Label  each  leaf  of  the  tree  with  the  permutations  that  led  to  that  leaf  and  throw  away  any  leaves 
that  are  not  associated  with  any  permutation.  To  do  this,  we  start  at  the  root  with  a  permutation 
/  of  n  and  at  each  vertex  in  the  tree  we  go  left  if  the  inequality  we  are  checking  holds  and  go  right 
if  it  fails  to  hold.  At  the  same  time,  we  carry  out  whatever  manipulations  on  the  data  in  /  that  the 
algorithm  requires.  When  we  arrive  at  a  leaf,  the  data  in  /  will  be  sorted.  Label  the  leaf  with  the  / 
we  started  out  with  at  the  root,  written  in  one  line  form.  Do  this  for  all  n!  permutations  of  n. 

For  example,  consider  the  following  algorithm  for  sorting  a  permutation  of  3. 

1.  If  the  entry  in  the  first  position  exceeds  the  entry  in  the  third  position,  switch  them. 

2.  If  the  entry  in  the  first  position  exceeds  the  entry  in  the  second  position,  switch  them. 

3.  If  the  entry  in  the  second  position  exceeds  the  entry  in  the  third  position,  switch  them. 

Figure  8.1  shows  the  labeled  decision  tree.  Two  positions  where  you  would  ordinarily  expect  leaves 
have  none  because  they  are  never  reached.  Consider  the  permuted  sequence  231.  The  "if"  in  Step  1 
is  true  and  results  in  a  switch  to  give  132.  The  second  "if"  is  false  and  results  in  no  switch.  The  third 
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compare  positions  1  and  3 

compare  positions  1  and  2 

compare  positions  2  and  3 

312  231  321         213  132  123 

Figure  8.1  A  sorting  algorithm  decision  tree.  Leftward  branches  correspond  to  decisions  to  switch.  Leaves 
are  labeled  with  starting  sequences. 


is  true  and  results  in  a  switch  to  give  123.  Thus,  the  sequence  of  decisions  associated  with  sorting 
231  is  switch,  no  switch  and  switch,  respectively. 

We  now  show  that  a  decision  tree  for  a  correct  sorting  algorithm  has  exactly  n!  leaves.  This 
will  follow  from  the  fact  that  each  leaf  is  labeled  with  exactly  one  permutation.  Why  can't  we  have 
two  permutations  at  the  same  leaf?  Since  each  leaf  in  any  such  decision  tree  is  associated  with 
a  particular  rearrangement  of  the  data,  both  permutations  of  n  would  be  rearranged  in  the  same 
fashion.  Since  they  differed  originally,  their  rearrangements  would  differ.  Thus  at  least  one  of  the 
rearrangements  would  not  be  1,  2, . . . ,  n  and  so  would  be  incorrectly  sorted. 

These  decision  trees  are  binary  RP-trees.  (An  RP-tree  was  defined  in  Definition  5.12  (p.  139) 
and  it  is  binary  if  each  node  has  at  most  two  sons.)  The  set  B  of  all  binary  RP-trees  with  nl  leaves 
includes  the  set  S  of  those  decision  trees  that  come  from  sorting  algorithms.  Since  S*  is  a  subset  of 
B,  a  lower  bound  for  any  function  on  B  is  also  a  lower  bound  for  that  function  on  S.  Hence  a  lower 
bound  on  worst  case  or  average  values  for  the  set  of  all  binary  RP-trees  with  nl  leaves  will  also  be 
a  lower  bound  on  WC(n)  or  AC(n)  for  any  algorithm  that  correctly  sorts  n  items  by  using  pairwise 
comparisons. 

In  our  decision  tree  approach,  a  comparison  translates  into  a  decision.  If  we  only  wanted  to 
study  WC(n),  we  could  finish  quickly  as  follows.  By  the  definition  of  WC(n),  the  longest  possible 
sequence  of  decisions  contains  WC(n)  decisions.  This  is  called  the  height  of  the  tree.  To  get  a  tree 
with  as  many  leaves  as  possible,  we  should  let  the  tree  branch  as  much  as  possible.  Since  the  number 
of  nodes  doubles  for  each  level  of  decisions  in  such  a  tree,  you  should  be  able  to  see  that  a  binary 
RP-tree  of  height  k  has  at  most  2'^  leaves.  Since  there  are  n!  leaves,  we  must  have  2*^^"^  >  n\. 
Taking  log2  of  both  sides  gives  us  WC(n)  >  log2(n!). 

What  about  AC(n)?  It's  not  too  difficult  to  compute  the  average  number  of  comparisons  for 
various  binary  RP-trees  when  they  are  not  too  large.  If  you  were  to  do  that  for  a  while,  you  would 
probably  begin  to  believe  that  the  lower  bound  we  just  derived  for  WC(n)  is  also  a  lower  bound  for 
AC(n),  as  claimed  in  the  theorem.  This  completes  the  motivation  for  believing  the  theorem. 

How  might  wc  prove  the  theorem?  Since  the  first  decision  (the  one  at  the  root)  divides  the  tree 
into  two  smaller  trees,  it  seems  reasonable  to  try  induction.  Unfortunately,  a  tree  with  (n+l)!  leaves 
is  a  lot  bigger  than  one  with  only  n!  leaves.  This  can  cause  problems  with  induction.  Let's  sever  our 
ties  with  sorting  algorithms  and  consider  decision  trees  with  any  number  of  leaves,  not  just  those 
where  the  number  of  leaves  is  nl  for  some  n. 

Prom  now  on,  n  will  now  indicate  the  total  number  of  leaves  in  the  tree,  not  the  number  of 
things  being  permuted. 

For  a  decision  tree  T,  let  TC(T)  be  the  sum,  over  all  leaves  £  of  T,  of  the  number  of  decisions 
needed  to  reach  £.  The  average  cost  is  then  TC  (T)/n.  To  prove  the  theorem,  it  suffices  to  prove 

TC(T)  >  nlog2  n  for  all  binary  RP-trees  T  with  n  leaves.  8.1 

Call  this  A{n).  Clearly  ^(1)  is  true  since  log2  1  =  0. 
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We  now  proceed  by  induction:  Given  A{k)  for  all  fc  <  n,  we  will  prove  A{n).  Let  T  be  a  binary 
RP-tree  with  n  leaves.  Consider  the  root  of  T.  If  it  has  only  one  son,  removing  the  root  gives  a 
binary  RP-tree  T'  with  n  leaves  and  TC(T)  =  TC(T')  +  n.  (The  "+n"  arises  because  each  of  the  n 
leaves  is  one  vertex  further  from  the  root  in  T  than  in  T'.)  If  A{n)  were  false  for  T,  it  would  also  be 
false  for  T' .  Thus  we  don't  need  to  consider  T  at  all  if  T  has  only  one  son.  Thus  it  suffices  to  prove 
(8.1)  when  the  root  has  degree  2.  Let  vl  and  vr  be  the  two  children  of  the  root  and  let  Tl  and  Tr 
be  the  trees  rooted  at  vl  and  vn-  Let  k  be  the  number  of  leaves  in  Tl-  Then  Tr  has  n  —  k  leaves. 
We  have 

TC(T)  =  (TC(Ti)  + /e)  +  (tC(T^j)  +  (n  -  A:))  >  klog^k  +  {n  -  k)log2{n  -  k)  +  n,  8.2 

where  the  last  part  follows  from  A{k)  and  A{n  —  k)  since  k  <  n  and  n  —  k  <n.  Clearly 

A;log2  A;  +  (n  —  fc)  log2(n  —  fc)  >  min/(a;),  8.3 

where  f{x)  =  x  log2  x  +  {n  —  x)  log2(n  —  x)  and  the  minimum  is  over  all  real  x  with  1  <  x  <  n  —  1. 

This  paragraph  deals  with  the  technicality  of  showing  that  the  minimum  of  f{x)  is  nlog2(n/2) 
and  that  it  occurs  at  x  =  n/2.  According  to  calculus  we  can  find  the  minimum  of  f{x)  over  an 
interval  by  looking  at  the  values  of  f{x)  at  the  endpoints  of  the  interval  and  when  f'{x)  =  0.  The 
present  endpoints  are  awkward  (but  they  could  be  dealt  with).  It  would  be  nicer  to  increase  the 
interval  to  0  <  a;  <  n.  To  do  this  we  must  assign  a  value  to  01og2  0  so  that  f{x)  is  continuous  at  0. 
Thus  we  define  0  logg  0  to  be 


lim  X  log2  X  =  lim 


log2a; 


0+  x^o+  1/x 


1/x  In  2 

=    lim   -—         by  I'Hopital's  Rule  from  calculus 

x^0+   —  1/x^ 

=    Um         =  0. 

x^o+  In  2 


Hence  /(O)  =  f{n)  =  nlog2n.  Since 


f{x)  =  log2  X  +  -  \og2{n  -  x)  -     "        „  =  log2 


a:ln2  (n  — a;)ln2  \n  —  x^ 

and  the  logarithm  is  zero  only  at  1,  it  follows  that  f'{x)  =  0  if  and  only  if  ^j^^  =  1;  that  is,  a;  =  n/2. 
Since 

/(n/2)  =  nlog2(n/2)  <  nlog2n  =  /(O)  =  /(n), 

the  minimum  of  f{x)  occurs  at  a;  =  n/2. 

We  have  now  shown  that  the  right  side  of  (8.3)  is  nlog2(n/2)  and  so,  from  (8.2), 

TC(n)  >  nlog2(n/2)+n  =  nlog2  n  —  nlog2  2  +  n  =  nlog2n. 

This  completes  the  induction  step  and  hence  the  proof  of  the  theorem.  Q 


Exercises 


8.1.1.  In  some  data  storage  algorithms,  information  is  stored  at  the  leaves  of  a  binary  RP-tree  and  one 
reaches  a  leaf  by  answering  a  question  at  each  vertex  on  the  path  from  the  root  to  a  leaf,  including  at 
the  leaf.  (The  question  at  the  leaf  is  to  allow  for  the  possibility  that  the  data  may  not  be  in  the  tree.) 

Suppose  there  are  n  items  of  data  stored  in  such  a  tree.  Show  that  the  average  number  of  questions 
required  to  recover  stored  data  is  at  legist  1  +  log2  n. 
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8.1.2.  The  average  case  result  in  Theorem  8.1  depends  heavily  on  the  fact  that  we  expect  to  visit  each  loaf 
equally  often.  This  may  not  be  the  case.  For  example,  in  some  situations  the  list  to  be  sorted  is  often 
nearly  in  order  already.  To  see  the  importance  of  such  frequency  of  visiting  the  leaves,  consider  a 
decision  tree  in  which  the  relative  frequency  of  visitation  is  0.1,  0.2,  0.3,  and  0.4.  Find  the  tree  for 
which  AC(T)  is  a  minimum. 

8.1.3.  Let  T  be  a  binary  decision  tree  with  n  leaves.  Suppose  that  TC(T)  is  the  least  it  can  be  for  a  binary 

RP-tree  with  n  leaves.  Since  the  minimum  in  (8.3)  occurred  at  a::  =  n/2,  one  might  expect  the  two 
principal  subtrees  of  T  to  have  a  nearly  equal  number  of  leaves.  But  this  is  not  the  case.  In  this 
exercise,  we  explore  what  can  be  said. 

(a)  Prove  that  T  is  a  full  binary  tree;  i.e.,  there  is  no  vertex  with  just  one  child. 

(b)  For  any  vertex  in  T,  let  h{v)  be  the  length  of  the  path  from  v  to  the  root  of  T.  Let  h  and  I2  be 
two  leaves  of  T.  Prove  that  \h{li)  —  h{l2)\  <  1. 

Hint.  Suppose  h{li)  —  h(l2)  >  2.  Let  v  be  the  parent  of  l\  and  consider  what  happens  when  I2 
and  the  subtree  rooted  at  v  are  interchanged. 

*(c)  If  2"*~^  <  n  <  2™,  prove  that  the  height  of  T  is  m.  (The  height  of  a  tree  is  the  maximum  of 
h{v)  over  all  vertices  v  of  T.) 

*(d)  If  2"-^  <  n  <  2"^,  prove  that  the  maximum  number  of  leaves  that  can  be  in  a  principal  subtree 
of  T  is  2™^^.  Explain  how  to  achieve  this  maximum. 

8.1.4.  In  some  sorts  (e.g.,  a  bucket  sort  described  in  the  next  section)  a  question  may  have  more  possible 
answers  than  just  "yes"  or  "no."  Suppose  that  each  question  has  k  possible  answers.  Show  that  the 
average  number  of  questions  required  to  sort  n  objects  is  at  least  logfc(n!).  You  may  use  the  following 
fact  without  proof: 

The  minimum  of  xi  logj,  xi  -\-  ■  ■  ■  -\-  log^.  x^  over  all  positive  x^  that  sum  to  n  is  obtained  when  all 
the  Xi  equal  n/d  and  the  value  of  the  minimum  is  nlogj.(n/rf). 

8.1.5.  In  some  data  storage  and  retrieval  algorithms,  a  key  and  data  are  stored  at  each  vertex  of  a  binary 
RP-tree  and  is  retrieved  by  means  of  a  key.  Suppose  k  is  the  key  whose  data  one  wants.  At  each 

vertex  v,  one  chooses  the  vertex  (and  stops)  or  one  of  the  principal  subtrees  at  v,  depending  on 
whether  k  equals,  exceeds,  or  is  less  than  the  key  at  v.  Let  TC*(r)  be  the  sum  over  all  vertices  v  of 
T  of  the  length  of  the  path  from  v  to  the  root. 

(a)  Show  that  a  binary  tree  with  no  path  to  the  root  longer  than  n  can  store  at  most  2""*"^  —  1  keys. 
Hint.  Obtain  a  recursion  for  the  number  and  use  induction. 

(b)  Show  that  a  tree  T  as  in  (a)  storing  the  maximum  number  of  possible  keys  has 
TC*(T)  =  (n- l)2"+i +2. 


8.2   Software  Sorts 


A  merge  sort  algorithm  was  studied  in  Example  7.13  (p.  211).  You  should  review  it  now. 

All  reasonably  fast  software  sorts  use  a  divide  and  conquer  method  for  attacking  the  problem.  As 
you  may  recall,  divide  and  conquer  means  splitting  the  problem  into  a  few  smaller  problems  which 
are  easier  either  because  they  are  smaller  or  because  they  are  simpler.  In  problems  where  divide  and 
conquer  is  most  successful,  it  is  often  the  case  that  the  smaller  problems  are  simply  instances  of  the 
same  type  of  problem  and  they  are  handled  by  applying  the  algorithm  recursively.  To  give  you  a  bit 
better  idea  of  what  divide  and  conquer  means,  here  is  how  the  algorithms  we'll  discuss  use  it.  Some 
of  this  may  not  mean  much  to  you  until  you've  finished  this  section,  so  you  may  want  to  reread  this 
list  later.  This  is  by  no  means  an  exhaustive  list  of  the  different  types  of  software  sorts. 

•  Quicksort  and  merge  sorts  split  the  data  and  spend  most  of  their  time  sorting  the  separate  pieces. 
Thus  they  divide  and  conquer  by  producing  two  smaller  sorting  problems  which  are  handled  in 
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a  recursive  manner. 

Quicksort  spends  a  little  time  dividing  the  data  in  such  a  way  that  recombining  the  two 
pieces  after  they  are  sorted  is  immediate.  It  divides  the  items  into  two  collections  so  that  all  the 
items  in  the  first  collection  should  precede  all  the  items  in  the  second.  The  division  is  done  "in 
place"  by  interchanging  items  that  are  in  the  wrong  lists.  Unless  it  is  extremely  unlucky,  the  two 
collections  will  have  roughly  the  same  number  of  elements.  The  two  collections  are  then  sorted 
separately. 

Merge  sorts  reverse  this:  dividing  is  immediate  and  recombination  takes  a  little  time.  In  both 
cases,  the  "little  time"  required  is  proportional  to  the  number  of  items  being  sorted  because  it 
requires  a  number  of  comparisons  that  is  nearly  equal  to  the  number  of  items  being  sorted. 

•  An  insertion  sort  builds  up  the  sorted  list  by  taking  the  items  on  the  unsorted  list  one  at  a  time 
and  inserting  them  in  a  sorted  list  it  is  building.  Divide  and  conquer  can  be  used  in  the  insertion 
process:  To  do  a  binary  insertion  sort,  split  the  list  into  two  nearly  equal  parts,  decide  which 
sublist  should  contain  the  new  item,  and  iterate  the  process,  using  the  sublist  as  the  list. 

•  Suppose  we  are  sorting  a  list  of  words  (or  numbers).  Bucket  sort  focuses  on  one  position  in  the 
words  at  a  time.  This  is  not  usually  a  good  divide  and  conquer  approach  because  the  task  is 
not  divided  into  just  a  few  subproblems  of  roughly  equal  difficulty:  On  an  n-long  list  with  k 
characters  per  word,  we  focus  in  turn  on  each  of  the  k  positions.  When  n  is  large,  k  will  be  large, 
too. 

It  is  easy  to  get  a  time  estimate  for  the  algorithm.  The  amount  of  time  it  takes  to  process 
one  character  position  for  all  n  words  is  proportional  to  n.  Thus,  the  time  to  sort  is  proportional 
to  nk.  How  fast  a  bucket  sort  is  depends  on  how  large  k  is  compared  to  n. 

•  Heapsort  divides  the  sorting  task  into  two  simpler  tasks. 

•  First,  the  items  are  arranged  in  a  structure,  called  a  "heap,"  which  is  a  rooted  tree  such 
that  the  smallest  item  in  the  tree  is  at  the  root  and  each  of  the  sons  of  the  root  is  also  the 
root  of  a  heap. 

•  Second,  the  items  are  removed  from  the  heap  one  by  one,  starting  with  the  top  and  preserving 
the  heap  structure. 

Each  of  these  two  tasks  requires  about  the  same  amount  of  time.  Adding  an  item  to  the  heap 
is  done  in  a  recursive  manner,  as  is  removing  an  item  from  the  heap.  The  fact  that  a  heap  is 
defined  recursively  makes  it  easy  to  design  recursive  algorithms  for  manipulating  heaps. 

Binary  Insertion  Sort 


Let  Ml,  1*2  ... ,  Un  be  the  unsorted  list.  At  the  end  of  step  t,  the  sorted  list  will  be  si,  S2  . . . ,  sj.  At 
step  t,  an  insertion  sort  determines  where  Ut  belongs  in  the  sorted  list,  opens  up  space  and  inserts 
it.  For  t  =  1,  this  is  trivial  since  the  sorted  list  is  empty.  In  general,  we  have  a  list  si, . . . ,  St-i-  We 
must  first  find  an  index  j  such  that  s,  <  Ut  for  i  <  j  and  Sj  >  Ut  for  i  >  j,  and  then  define  a  new 
sorted  list  by 

{old  Si       if  1  <  i  <  j; 
Ut  ifi=j; 
old  Si-i    if  j  <  i  <  t. 

How  can  we  insure  that  a  small  number  of  comparisons  is  required  for  determining  j?  Simply 
searching  the  list  from  the  start  to  find  the  place  would,  on  average,  take  an  amount  of  time  pro- 
portional to  k.  A  binary  insertion  sort  uses  divide  and  conquer  to  produce  a  much  quicker  insertion. 
It  looks  at  the  middle  of  the  sorted  list  to  decide  which  half  should  contain  Ut  and  then  iterates  on 
that  half  until  we  are  reduced  to  comparing  Ut  to  a  single  item.  This  dividing  of  the  list  makes  in- 
sertion in  a  fc  long  list  take  one  more  comparison  than  insertion  into  a  fc/2  long  list.  Calculating  the 
number  of  comparisons  is  left  to  Exercise  8.2.2. 
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Programming  Note  In  order  to  avoid  moving  a  lot  of  items  to  insert  ui,  insertion  sorts  are 
implemented  by  using  a  more  complex  data  structure  than  a  simple  array.  These  data  structures 
require  more  storage  and  somewhat  more  comparisons  than  binary  insertion  requires,  but  they 
require  less  movement  of  items.  We  will  not  discuss  them.  If  you  want  further  information, 
consult  a  good  text  on  data  structures  and  algorithms. 

Bucket  Sort 


As  the  name  may  suggest,  a  bucket  sort  is  like  throwing  things  into  buckets.  If  we  have  a  collection 
of  buckets,  wc  can  put  each  item  into  the  bucket  where  it  belongs.  If  the  buckets  are  in  order  and 
each  bucket  contains  at  most  one  item,  then  the  items  are  sorted. 

Since  it  is  not  practical  to  have  this  many  buckets,  a  recursive  method  is  used.  This  is  better 
thought  of  in  terms  of  piles  instead  of  buckets.  Suppose  that  you  want  to  bucket  sort  the  two  digit 
numbers 

22  31  12  23  13  33  11  21. 

Here's  how  to  do  it. 

1.  Construct  piles  of  numbers,  one  pile  for  each  unit's  digit,  making  sure  that  the  order  within  a 
pile  is  the  same  as  the  order  in  the  list.  Here's  the  result. 

1:  31  11  21    2:  22  12     3:  23  13  33. 

2.  Make  a  list  from  the  piles,  preserving  the  order.  Here's  the  result. 

31  11  21  22  12  23  13  33. 

3.  Repeat  the  first  two  steps  using  the  new  list  and  using  the  ten's  digit  instead  of  the  unit's  digit. 
Here  are  the  results. 

1:  11  12  13    2:  21  22  23     3:  31  33 

gives 

11  12  13  21  22  23  31  33. 

This  method  can  be  generalized  to  k  digit  numbers  and  k  letter  words.  It  is  left  as  an  exercise. 
In  this  case,  each  item  is  examined  k  times  in  order  to  place  it  in  a  bucket.  Thus  we  place  items 
in  buckets  kn  times.  If  we  think  of  digits  as  "letters,"  then  a  k  digit  number  is  simply  a  k  letter 
word.  Bucket  sorting  puts  the  words  in  lexicographic  order.  This  sorting  method  can  only  be  used 
on  strings  that  are  thought  of  as  "words"  made  up  of  "letters"  because  we  must  look  at  items  one 
"letter"  at  a  time.  The  result  is  always  a  lexicographic  ordering. 

Suppose  that  we  are  sorting  a  large  number  n  oi  k  letter  words  where  k  is  fairly  small.  In  this 
case,  kn,  the  number  of  times  we  have  to  place  something  in  a  bucket  is  less  than  the  lower  bound, 
log2(n!),  for  the  number  of  comparisons  that  we  proved  in  the  previous  section.  How  can  this  be? 

To  begin  with,  deciding  which  bucket  to  place  an  item  in  is  not  a  simple  comparison  unless 
there  are  only  two  buckets.  If  we  want  to  convert  that  into  a  process  that  involves  making  decisions 
between  pairs  of  items,  we  must  do  something  like  a  binary  insertion,  comparing  the  item  with  the 
labels  on  the  buckets.  That  will  require  about  log2  A  comparisons,  where  A  is  the  number  of  letters 
in  the  alphabet.  Taking  the  number  of  comparisons  into  account,  we  obtain  an  estimate  of  fcnlog2  A 
for  the  number  of  comparisons  needed  to  bucket  sort  a  list  of  n  items. 

It  seems  that  we  could  simply  keep  k  and  A  small  and  let  n  get  large.  This  would  still  violate 
the  lower  bound  of  log2(n!),  which  is  about  nlog2  n.  What  has  been  ignored  here  is  the  fact  that  the 
lower  bound  was  derived  under  the  assumption  that  there  can  be  nl  different  orderings  of  a  list.  This 
can  only  happen  if  the  list  contains  no  duplicate  words.  Thus  we  must  be  able  to  create  at  least  n 
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distinct  words.  Since  there  arc  A'^  possible  k  letter  words  using  an  A  letter  alphabet,  we  must  have 
A''  >  n.  Thus  k  logj  A  >  logj  n  and  so  kn  logj  A>  n  logj  n,  which  agrees  with  our  bound. 

Programming  Note  Hardware  implementations  of  a  bucket  sort  were  in  use  for  quite  some 
time  before  inexpensive  computers  largely  replaced  them.  They  sorted  "IBM  cards,"  thin  rect- 
angular pieces  of  cardboard  with  holes  punched  in  them  by  a  "keypunch."  The  operator  could 
set  the  sorting  machine  to  drop  the  cards  into  bins  according  to  the  jth  "letter"  punched  on 
the  card.  To  sort  cards  by  the  word  in  columns  i  through  j,  the  operator  sorted  on  column  j, 
restacked  the  cards,  sorted  on  column  j  —  1,  restacked  the  cards,  . . .,  sorted  on  column  i  and 
rcstacked  the  cards.  In  other  words,  a  bucket  sort  was  used.  These  machines  were  developed 
about  a  hundred  years  ago  for  dealing  with  United  States  census  data.  The  company  that  man- 
ufactured them  became  IBM. 

In  software  bucket  sorts,  one  usually  avoids  comparisons  entirely  by  maintaining  an  A  long 
table  of  pointers,  one  pointer  for  each  bucket.  The  letter  is  then  used  as  an  index  into  the  table 
to  find  the  correct  bucket. 

Merge  Sorts 


We've  discussed  a  simple  merge  sort  already  in  Example  7.13  (p.  211).  Recall  that  the  list  is  divided 
arbitrarily  into  two  pieces,  each  piece  is  sorted  (by  using  the  merge  sort  recursively)  and,  finally, 
the  two  sorted  pieces  are  merged.  The  naive  method  for  merging  two  lists  is  to  repeatedly  move  the 
smaller  of  the  top  items  in  the  two  lists  to  the  end  of  the  merged  list  wc  are  constructing,  but  this 
cannot  be  implemented  in  a  sorting  network.  The  Batcher  sort  uses  a  more  complex  merging  process 
that  can  be  implemented  in  a  sorting  network.  We'll  study  it  in  the  next  section. 

Programming  Note  For  simplicity  assume  that  n  is  a  power  of  2.  You  can  imagine  the 
original  list  as  consisting  of  n  1-long  lists  one  after  the  other.  These  can  be  merged  two  at  a 
time  to  produce  (n/2)  2-long  lists  one  after  the  other.  In  general,  wc  have  2^-long  lists 

which  can  be  merged  two  at  a  time  to  produce  (n/2'^+^)  2^'^^-long  lists  one  after  the  other.  If 
the  data  is  on  tape  (see  footnote  page  227),  one  starts  with  two  sets  of  2^^ -long  lists  and  produces 
two  sets  of  2'^+^-iojig  lists  by  merging  the  top  two  lists  in  each  set  and  placing  the  merged  lists 
alternately  in  the  output  sets.  Since  only  sequential  access  is  required,  this  simple  merge  sort  is 
ideally  suited  to  data  on  tape  if  four  tape  drives  are  available.  There  are  variations  of  this  idea 
which  are  faster  because  they  require  less  tape  movement. 

Quicksort 


In  Quicksort,  an  item  is  selected  and  the  list  is  divided  into  two  pieces:  those  items  that  should  be 
before  the  selected  item  and  those  that  should  be  after  it.  This  is  done  in  place  so  that  one  sublist 
precedes  the  other.  If  the  two  sublists  axe  then  sorted  recursively  using  Quicksort,  the  entire  original 
list  will  be  sorted.  About  n  comparisons  are  needed  to  divide  the  list. 

How  is  the  division  accomplished?  Here's  one  method.  Memorize  a  list  element,  say  x.  Start  a 
pointer  at  the  left  end  of  the  list  and  move  it  rightward  until  something  larger  than  x  is  encountered. 
Similarly,  start  a  pointer  moving  leftward  from  the  right  end  until  something  smaller  than  x  is 
encountered.  Switch  the  two  items  and  start  the  pointers  moving  again.  When  the  pointers  reach 
the  same  item,  everything  to  the  left  of  the  item  is  at  most  equal  to  x  and  everything  to  right  of  it 
is  at  least  equal  to  x. 

How  long  Quicksort  takes  depends  on  how  evenly  the  list  is  divided.  In  the  worst  case,  one  sublist 
has  one  item  and  the  remaining  items  are  in  the  other.  If  this  continues  through  each  division,  the 
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number  of  comparisons  needed  to  sort  the  original  list  is  about  r?  jl.  In  the  best  case,  the  two  sublists 
are  as  close  to  equal  as  possible.  If  this  continues  through  each  division,  the  number  of  comparisons 
needed  to  sort  the  original  list  is  about  n  log2  n.  The  average  number  of  sorts  required  is  fairly  close 
to  nlog2  n,  but  it's  a  bit  tricky  to  prove  it.  We'll  discuss  it  in  Example  10.12  (p.  289). 

Heapsort 


We  give  a  rough  idea  of  the  nature  of  Heapsort.  A  full  discussion  of  Heapsort  is  beyond  the  scope  of 
this  text. 

To  explain  Heapsort,  we  must  define  a  hea-p.  This  data  structure  was  described  in  a  rough  form 
above.  Here  is  a  complete,  but  terse,  definition.  A  heap  is  a  rooted  binary  tree  with  a  bijection 
between  the  vertices  and  the  items  that  are  being  sorted.  The  tree  and  bijection  /  satisfy  the 
following. 

•  The  smallest  item  in  the  image  of  /  is  associated  with  the  root. 

•  The  heights  of  the  sons  of  the  root  differ  by  at  most  one. 

•  For  each  son  v  of  the  root,  the  subtree  rooted  at  v  and  the  bijection  restricted  to  that  subtree 

form  a  heap. 

Thus  the  smallest  case  is  a  tree  consisting  of  one  vertex.  This  corresponds  to  a  list  with  one  element. 

It  turns  out  that  it  is  fairly  easy  to  add  an  item  so  that  the  heap  structure  is  preserved  and  to 
remove  the  least  item  from  the  heap,  in  a  way  that  preserves  the  heap  structure.  We  will  not  discuss 
how  a  heap  is  implemented  or  how  these  operations  are  carried  out.  If  you  are  interested  in  such 
details,  see  a  text  on  data  structures  and  algorithms. 

Heapsort  creates  a  heap  from  the  unsorted  list  and  then  creates  a  sorted  list  from  the  heap  by 
removing  items  one  by  one  so  that  the  heap  structure  is  preserved.  Thus  the  divide  and  conquer 
method  in  Heapsort  involves  the  dividing  of  sorting  into  two  phases:  (a)  creating  the  heap  and 
(b)  using  the  heap.  Inserting  or  removing  an  item  involves  traversing  a  path  between  the  root  and 
a  leaf.  Since  the  greatest  distance  to  a  leaf  in  a  tree  with  fc  nodes  is  about  log2  fc,  the  creation  of  the 
heap  from  an  unsorted  list  takes  about  log2  k  ~  nlog2  n  comparisons,  as  does  the  dismantling 

of  the  heap  to  form  the  sorted  list.  Thus  Heapsort  is  a  reasonably  fast  sort. 

Note;  that  a  heap  is  an  intermediate  data  structure  which  is  quickly  constructed  from  an  unsorted 
list  and  which  quickly  leads  to  a  sorted  list.  This  observation  is  important.  If  we  are  in  a  situation 
where  we  need  to  continually  add  to  a  list  and  remove  the  smallest  item,  a  heap  is  a  good  data 
structure  to  use.  This  is  illustrated  in  the  following  example. 

Programming  Note  One  way  a  Leap  can  be  implemented  is  with  the  classical  form  of  a 
heap — a  binary  RP-tree  in  which 

•  each  nonleaf  node  has  a  left  son  and,  possibly,  a  right  son; 

•  the  longest  and  shortest  distances  from  the  root  to  leaves  differ  by  at  most  one  and 

•  an  item  is  stored  at  each  node  in  such  a  way  that  an  item  at  a  node  precedes  each  of  its  sons 
in  the  sorted  order. 

Note  that  the  last  condition  implies  that  the  smallest  item  is  at  the  top  of  the  heap.  This  tree 
can  be  stored  as  an  array  indexed  from  1  to  n.  The  sons  of  the  item  in  position  k  are  in  positions 
2k  and  2k  +  1,  if  these  numbers  do  not  exceed  n.  With  clever  programming,  the  unsorted  list, 
the  heap  and  the  sorted  list  can  all  occupy  this  space,  so  no  extra  storage  is  needed. 
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Exercises 


8.2, 


1 


(Binary  Insertion  Sort)  Show  the  steps  in  a  binary  insertion  sort  of  the  list 


9  15  6  12  3  7  11  5  14  1  10  4  2  13  16  8. 


8.2.2.  (Binary  Insertion  Sort)  Show  that  the  number  of  comparisons  needed  to  find  the  location  where  ut 
belongs  is  about  log2  t.  Use  this  to  show  that  about  n  log2  n  comparisons  are  needed  to  sort  an  n 
long  list  by  this  method. 
Hint.  Using  calculus,  one  can  show  that 


8.2.3.  (Binary  Insertion  Sort)  You  are  to  construct  the  decision  tree  for  the  binary  insertion  sort.  Label  each 
leaf  with  the  unsorted  list  written  in  one  line  form,  as  was  done  in  Figure  8.1.  If  we  are  comparing  an 
item  with  a  sorted  list  of  even  length,  there  is  no  middle  item.  In  this  case,  use  the  item  just  before 
the  middle.  This  problem  is  a  bit  tricky.  Unlike  the  tree  in  Figure  8.1,  not  all  vertices  at  the  same 

distance  from  the  root  involve  the  same  comparison.  As  a  help,  you  may  want  to  label  the  vertices 
of  the  decision  tree  with  the  comparisons  that  are  being  made;  for  example,  i  :  j  could  mean  that  Uj 
is  being  compared  to  sj. 

(a)  Construct  the  decision  tree  for  the  binary  insertion  sort  on  permutations  of  2. 

(b)  Construct  the  decision  tree  for  the  binary  insertion  sort  on  permutations  of  3. 

8.2.4.  (Bucket  Sort)  Show  the  steps  in  bucket  sorting  33,  41,  23,  14,  21,  24. 

8.2.5.  (Bucket  Sort) 

(a)  Carefully  state  an  algorithm  for  bucket  sorting  words  which  have  exactly  k  letters  each. 

(b)  Carefully  state  an  algorithm  for  bucket  sorting  words  which  have  at  most  k  letters  each. 

8.2.6.  (Bucket  Sort)  Use  induction  on  k  to  prove  that  the  algorithms  in  the  previous  problem  work. 
Hint.  By  induction,  after  k  —  1  steps  we  have  a  list  in  which  the  items  are  in  order  if  the  first  letter 
is  ignored.  What  happens  to  this  order  within  each  of  the  piles  we  create  when  we  look  at  the  first 
letter? 

8.2.7.  (Merge  Sort)  Using  the  programming  suggestion  in  the  text,  show  on  paper  how  merge  sorting  with 
4  tapes  works  for  the  list 


(See  the  footnote  on  page  227  for  an  explanation  of  "tape.") 

8.2.8.  (Merge  Sort)  Suppose  we  have  n  items  stored  on  a  single  tape  and  that  we  can  sort  k  items  at  one 
time  in  memory.  Explain  how  to  reduce  the  number  of  times  tapes  need  to  be  read  for  merge  sort  by 
first  sorting  sets  of  k  items  in  memory. 

8.2.9.  (Quicksort)  Prove  that  the  number  of  comparisons  required  when  the  list  is  split  as  unevenly  as 
possible  is  about  n^/2  as  claimed.  Prove  that  it  is  about  nlog2n  when  the  splits  are  about  as  even 


9  15  6  12  3  7  11  5  14  1  10  4  2  13  16  8. 


as  possible. 
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8.2.10.  (Quicksort)  Suppose  that  the  each  time  a  hst  of  k  items  is  divided  in  Quicksort  the  smaller  piece 
contains  rk  items  and  the  larger  contains  (1  —  r)k.  Let  Q{n)  be  the  number  of  comparisons  needed 
to  sort  a  list  in  this  case. 

(a)  Show  that  Q{ri)  is  about  n  +  Q{rn)  +  Q{{1  —  r)n). 

(b)  Present  a  reasonable  argument  (i.e.,  it  need  not  be  quite  a  proof)  that  Q{n)  is  about  on  Inn 
where 

l+o(rlnr  +  (l-r)ln(l-r))  =  0. 

(c)  Verify  that  this  gives  the  correct  answer  when  the  list  is  divided  evenly  each  time  (r  =  1/2). 

(d)  It  can  be  shown  that  r  =  1/3  is  a  reasonable  approximation  for  average  behavior.  What  is  Q(n) 
in  this  case? 

8.3   Sorting  Networks 


In  sorting,  items  are  compared  and  actions  taken  as  a  result  of  the  comparison.  Since  items  must 
be  rearranged,  the  simplest  kind  of  action  one  might  visualize  is  to  either  do  nothing  or  interchange 
the  two  items  that  were  compared.  We  can  imagine  a  hardware  device  for  doing  this,  which  we  will 
call  a  comparator.  It  has  two  inputs,  say  xi  and  X2,  and  two  outputs,  say  yi  and  1/2,  where  yi  is 
whichever  one  of  xi  and  X2  should  appear  earlier  in  the  sorted  list  and  2/2  is  the  other.  A  simple 
hardware  device  for  sorting  consists  of  a  network  of  interconnected  comparators.  This  is  called  a 
sorting  network.  For  example,  consider  the  following  algorithm  for  sorting  a  permutation  of  3. 

1.  If  the  entry  in  the  second  position  exceeds  the  entry  in  the  third  position,  switch  them. 

2.  If  the  entry  in  the  first  position  exceeds  the  entry  in  the  second  position,  switch  them. 

3.  If  the  entry  in  the  second  position  exceeds  the  entry  in  the  third  position,  switch  them. 

Figure  8.2  shows  a  sorting  network  to  accomplish  this.  The  data  enters  at  the  left  end  of  the  network 
and  moves  along  the  lines.  The  rearranged  data  emerges  at  the  right  end.  A  vertical  connection 
between  two  lines  represents  a  comparator — the  two  inputs  emerge  in  sorted  order. 

Some  of  the  questions  we'd  like  to  ask  about  sorting  networks  are 

•  How  fast  can  sorting  networks  be? 

•  How  many  comparators  are  needed? 

•  How  can  we  tell  if  a  network  sorts  correctly? 

8.3.1   Speed  and  Cost 


In  this  section  we'll  focus  on  the  first  two  questions  that  we  raised  above. 

Sorting  networks  achieve  speed  by  three  methods:  fast  comparators,  parallelism  and  pipelining. 
The  first  method  is  a  subject  for  a  course  in  hardware  design,  so  we'll  ignore  it.  Parallelism  is  built 
into  any  sorting  network;  we  just  haven't  realized  that  in  our  discussion  yet.  Pipelining  is  a  common 
design  technique  for  speeding  up  computers. 

Two  things  that  will  make  a  network  cheaper  to  manufacture  is  decreasing  the  number  of 
comparators  it  contains  and  increasing  the  regularity  of  the  layout.  Thus  we  can  ask  how  many 
comparators  are  needed  and  if  they  can  be  arranged  in  a  regular  pattern. 


8.3     Sorting  Networks  239 


ai 
as 


21 
Z3 


Figure  8.2  Left:  A  sorting  network  with  inputs  flj  and  outputs  Zi-  Vertical  lines  are  comparators  Right: 
What  the  network  does  to  the  inputs  3,2,1. 


Figure  8.3    Left:  A  sorting  network  for  the  Bubble  Sort.  Right:  The  same  network,  but  faster. 


Figure  8.4  Left:  A  Brick  Wall  network  for  eight  items.  Right:  The  "Batcher"  network  for  eight  items  is 
faster. 


Parallelism 


All  the  algorithms  we've  discussed  so  far  have  been  carried  out  sequentially;  that  is,  one  thing  is  done 
at  a  time.  This  may  not  be  realistic  for  algorithms  on  a  supercomputer.  It  is  certainly  not  realistic 
for  sorting  networks  because  parallelism  is  implicit  in  their  design.  Compare  the  two  networks  in 
Figure  8.3.  They  both  perform  the  same  sorting,  but,  as  is  evident  from  the  second  picture,  some 
comparators  can  work  at  the  same  time. 

In  the  expanded  form  on  the  left  side,  it  is  easy  to  see  that  the  network  sorts  correctly.  The 
leftmost  set  of  comparators  rising  to  the  right  finds  the  first  item;  the  next  set,  the  second  item; 
the  next,  the  third.  The  rightmost  comparator  puts  the  last  two  items  in  order.  This  idea  can  be 
extended  to  obtain  a  network  that  will  sort  n  >  1  items  in  2ri  —  3  "time  units,"  where  a  time  unit 
is  the  length  of  time  it  takes  a  comparator  to  operate.  Can  we  improve  on  this? 

A  natural  idea  is  to  fill  in  as  many  comparators  as  possible,  thereby  obtaining  a  brick  wall  type 
of  pattern  as  shown  in  Figure  8.4.  How  long  does  the  brick  wall  have  to  be?  It  turns  out  that,  for 
n  items,  it  must  have  length  n.  Can  we  do  better  than  a  brick  wall  if  the  comparators  connect 
two  lines  which  are  not  adjacent?  Yes,  we  can.  Figure  8.4  shows  a  "Batcher  sort."  which  is  faster 
than  the  brick  wall.  We'll  explain  Batcher  sorts  later.  A  vertical  line  spanning  several  horizontal 
lines  represents  a  comparator  connecting  the  topmost  with  the  bottommost.  To  avoid  overlapping 
lines  in  diagrams,  comparators  that  arc  separated  by  a  small  horizontal  distances  in  a  diagram  are 
understood  to  all  be  operating  at  the  same  time.  The  brick  wall  in  Figure  8.4  takes  8  time  units  and 
the  Batcher  sort  takes  6. 
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How  Fast  Can  a  Network  Be? 


Since  a  network  must  have  at  least  log2(n!)  comparators  and  since  at  most  n/2  comparators  can 
operate  at  the  same  time,  a  network  must  take  at  least  log2(n!)/(n/2)  fa  21og2n  time  units.  It  is 
known  that  for  some  C  and  for  all  large  n,  there  exist  networks  that  take  no  more  than  C  log2  n 
time  units.  This  is  too  complicated  for  us  to  pursue  here. 

Pipelining  is  an  important  method  for  speeding  up  a  network  if  many  sets  of  items  must  be 
sorted.  Suppose  that  we  have  a  delay  unit  that  delays  an  item  by  the  length  of  time  it  takes  a 
comparator  to  operate.  Insert  delay  units  in  the  network  where  there  are  no  comparators  so  that  all 
the  items  move  through  the  network  together.  Once  a  set  of  items  has  passed  a  position,  a  new  set 
can  enter.  Thus  we  can  feed  a  new  set  of  items  into  the  network  each  time  unit.  The  first  set  will 
emerge  from  the  end  some  time  later  and  then  each  successive  set  will  take  just  one  additional  time 
unit  to  emerge.  For  example,  a  brick  wall  network  can  sort  one  set  of  1,000  items  in  1,000  time  units, 
but  50  sets  take  only  1,049  time  units  instead  of  1,  000  x  50.  This  technique  is  known  as  pipelining 
because  the  network  is  thought  of  as  a  pipeline  through  which  things  are  moving. 

Pipelining  is  used  extensively  in  computer  design.  It  is  obvious  in  the  "vector  processing"  hard- 
ware of  supercomputers,  but  it  appears  in  some  form  in  many  central  processing  units.  For  example, 
the  INTEL-8088  microprocessor  can  roughly  be  thought  of  as  a  two  step  pipeline:  (i)  the  bus  in- 
terface unit  fetches  instructions  and  (ii)  the  execution  unit  executes  them.  (It's  actually  a  bit  more 
complicated  since  the  bus  interface  unit  handles  all  memory  read/write.) 

How  Cheap  Can  a  Network  Be? 


Our  previous  suggestion  for  using  a  brick  wall  network  to  sort  1,000  items  could  be  expensive — it 
requires  500,000  comparators.  This  number  could  be  reduced  by  using  a  more  efficient  network; 
however,  we'd  probably  have  to  sacrifice  our  nice  regular  pattern.  There's  another  way  we  can 
achieve  a  dramatic  saving  if  we  are  willing  to  abandon  pipelining. 

The  brick  wall  network  is  simply  the  first  two  columns  of  comparators  repeated  about  n/2  times. 
We  could  make  a  network  that  consists  of  just  these  two  columns  with  the  output  feeding  back  in  as 
input.  Start  the  network  by  feeding  in  the  desired  values.  After  about  n  time  units  the  sorted  items 
will  simply  be  circulating  around  in  the  network  and  can  be  read  whenever  we  wish. 

If  we  insist  on  pipelining,  how  many  comparators  arc  needed?  From  the  exercises  in  the  next 
section,  you  will  see  that  a  Batcher  sorting  network  requires  considerably  less  time  and  considerably 
less  comparators  than  a  brick  wall  network;  however,  it  is  not  best  possible.  Unlike  software  sorting, 
there  is  a  large  gap  between  theory  and  practice  in  sorting  nets:  Theory  provides  a  lower  bound  on 
the  number  of  comparators  that  are  required  and  specific  networks  provide  an  upper  bound.  There 
is  a  large  gap  between  these  two  numbers.  In  contrast,  the  upper  and  lower  bounds  for  pairwise 
comparisons  in  software  sorts  differ  from  nlnn  by  constant  factors  of  reasonable  size. 

Whether  or  not  we  keep  our  pipelining  capabilities,  we  face  a  variety  of  design  tradeoffs  in 
designing  a  VLSI  chip  to  implement  a  sorting  network.  Among  the  issues  are  the  number  of  com- 
parators, the  distance  between  the  lines  a  comparator  must  connect,  regularity  of  the  design  and 
delay  problems.  They  are  beyond  the  scope  of  our  text. 

Exercises 


8.3.1.  Find  the  fastest  network  you  can  for  sorting  3  things  and  prove  that  there  is  no  faster  network. 

8.3.2.  Find  the  fastest  network  you  can  for  sorting  4  things.  If  you've  read  the  next  section,  prove  that  your 
network  sorts  correctly. 
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8.3.3.  Find  the  fastest  network  you  can  for  sorting  5  things.  If  you've  read  the  next  section,  prove  that  your 

network  sorts  correctly. 

8.3.4.  Suppose  we  have  a  network  for  sorting  n  items  and  we  wish  to  sort  less  than  n.  How  can  we  use  the 
network  to  do  this? 

8.3.5.  In  this  exercise,  comparators  are  only  hooked  to  adjacent  lines  as  in  the  brick  wall.  Find  a  network 

that  will  sort  n  inputs  and  that  has  as  few  comparators  as  you  can  manage. 

Hint.  Look  at  some  small  networks  (e.g.,  n  =  3  and  n  =  4)  and  try  to  find  a  simple  pattern  to 
generalize.  If  you  read  ahead  a  bit  before  working  on  this  problem,  you  will  encounter  Theorem  8.3, 
which  will  help  you  prove  that  your  network  sorts. 

8.3.6.  Here's  an  idea  for  an  even  cheaper  sorting  network  using  the  brick  wall. 

(a)  Construct  a  two  time  unit  network  that  consists  of  the  first  two  time  units  of  a  brick  wall.  Feed  the 
output  of  the  network  back  as  input.  After  some  time  f{n),  the  output  will  be  sorted  regardless 
of  the  original  input.  Find  the  minimum  value  of  f{n)  and  explain  why  this  behaves  like  a  brick 
wall.  (Note,  that  using  the  network  k  times  means  that  2k  time  units  have  elapsed.) 

(b)  What  can  be  done  if  we  have  such  an  arrangement  for  sorting  n  items  and  we  wish  to  sort  less 


(c)  When  n  is  even,  we  can  get  by  with  even  less  comparators.  Construct  a  network  that  compares 
the  items  2i  —  1  and  2i  for  1  <  i  <  n/2.  Call  the  inputs  xi,. . .  ,Xn  and  the  outputs  yi,. . .  ,yn- 
Feed  the  output  of  the  network  back  in  as  follows: 


8.3.2   Proving  That  a  Network  Sorts 


Unlike  many  software  sorts,  it  is  frequently  difBcult  to  prove  that  a  network  actually  sorts  all  inputs 
correctly.  There  is  no  panacea  for  this  problem.  The  following  two  theorems  are  sometimes  helpful. 

Theorem  8.2  Zero-One  Principle  If  a  network  correctly  sorts  all  inputs  of  zeroes  and 
ones,  then  it  correctly  sorts  all  inputs. 

Theorem  8.3  Adjacent  Comparisons  If  the  comparators  in  a  network  only  connect 
adjacent  lines  and  if  the  network  correctly  sorts  the  reversed  sequence  n, . . . ,  2, 1,  then  it  correctly 
sorts  all  inputs. 

We  will  prove  the  Zero-One  Principle  shortly.  The  proof  of  the  other  theorem  is  more  complicated 

and  will  not  be  given. 

Since  the  Adjacent  Comparisons  Theorem  requires  that  only  one  input  be  checked,  it  is  quite 
useful.  Unfortunately,  the  comparators  must  connect  adjacent  lines.  To  see  that  this  is  needed, 
consider  a  three  input  network  in  which  the  top  and  bottom  lines  arc  connected  by  a  comparator 
and  there  are  no  other  comparators.  It  correctly  sorts  the  inputs  1,2,3  and  3,2,1,  but  it  does  not  sort 
any  other  permutations  of  1,2,3  correctly. 

The  Zero-One  Principle  may  seem  somewhat  useless  because  it  still  requires  that  many  inputs 
be  considered.  We  will  see  that  it  is  quite  useful  for  proving  that  a  Batcher  sort  works. 


than  n? 


old  yn        if  A:  =  1; 
old  yk-i  otherwise. 


At  time  2k  disable  the  comparator  between  lines  2k  —  1  and  2k.  Show  that  this  sorts  after  some 
number  of  steps.  How  many  steps  are  needed  in  general? 
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Figure  8.5    Left:  A  single  comparator  with  arbitrary  inputs.  Right:  Applying  a  nondecreasing  function. 


Proof:  We  now  prove  the  Zero-One  Principle.  If  the  network  fails  to  sort  some  sequence,  we  will 
show  how  to  construct  a  sequence  of  zeroes  and  ones  that  it  fails  to  sort. 

The  idea  behind  our  proof  is  tlic  following  simple  observation:  Suppose  that  /  is  a  nondecreasing 
function,  then  a  comparator  treats  /(s)  and  f{t)  the  same  as  it  does  s  and  t.  This  is  illustrated  in 
Figure  8.5.  It  is  easy  to  show  by  considering  the  three  cases  s  <  t,  s  =  t  and  s  >  t. 

Suppose  a  network  has  N  comparators,  has  inputs  xi, . . .  ,Xn  and  outputs  yi, . . . , Un-  Let  /  be 
a  nondecreasing  function.  We  will  use  induction  on  N  to  prove  that  if  f{x\),...,  f{xn)  are  fed  into 
the  network,  then  f{yi), . . . ,  /(?/«)  emerge  at  the  other  end. 

If  A''  =  0,  the  result  is  obviously  true  since  there  are  no  comparators  present. 

Now  suppose  that  A''  >  0  and  that  the  result  is  true  for  —  1.  Wc  will  prove  it  for  A^.  Focus  on 
one  of  the  comparators  that  is  used  in  the  last  time  unit.  Let  network  A^i  be  the  original  network  with 
this  comparator  removed  and  let  network  7V^2  be  the  original  network  with  all  but  this  comparator 
removed.  Let  zi,. . .  ,Zn  be  the  output  of  J\fi  and  use  it  as  the  input  to  Clearly  the  output  of 
J\f2  is  2/1,  ■■■,yn- 

Let  /(.Ti), . . . ,  f{xn)  be  input  to  the  network  and  let  u\, . . .  ,Un  be  the  output.  We  can  break 
this  into  two  pieces: 

•  Let  f{xi), . . . ,  /(x„)  be  the  input  to  Mi. 

By  the  induction  hypothesis,  the  output  is  f{zi), . . . ,  f{zn)- 

•  Let  f{zi), . . . ,  f{zn)  be  the  input  to  ^2- 
The  output  is  then  ui, . . . ,  u„. 

We  must  prove  that  mi,  . . . ,  u„  is  /(yi), . . . ,  /(?/„). 

Recall  that  J\f2  consists  of  a  single  comparator.  Let  the  input  positions  i  and  j  go  into  the 

comparator.  If  fc  7^  then  that  input  position  does  not  go  through  the  comparator  and  so  f{zk) 
is  treated  the  same  as  Zk-  Our  observation  for  a  single  comparator  applies  to  the  input  positions  for 
Zi  and  Zj  since  they  feed  into  the  comparator.  This  proves  that  wi, . . . ,  u„  equals  f{yi), . . . ,  f{yn) 
and  so  proves  our  claim  about  nondecreasing  functions. 

Now  suppose  that  the  network  fails  to  sort  Xi,. . .  ,Xn-  We  will  show  how  to  construct  a  nonde- 
creasing 0-1  valued  function  /  such  that  the  network  fails  to  sort  f{xi),...  ,f{xn)-  Since  the  sort 
fails,  we  must  have  yi  >        for  some  i  <n.  Define 


Since  j/i+i  <  yi,  we  have  /(j/i+i)  =  0.  Because  f{yi)  =  1,  the  network  fails  to  sort  f{xi), . . . ,  f{xn)-  D 
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The  Batcher  Sort 


The  Batcher  sort  is  a  merge  sort  that  is  suitable  for  implementation  using  a  sorting  network.  Like  all 
merge  sorts,  it  is  defined  recursively.  Let  k  =  \n/2]  be  the  result  of  rounding  n/2  up;  for  example, 
[3.5]  =  4.  For  a  Batcher  sort,  the  first  k  items  are  sorted  and  the  last  n  —  k  are  sorted.  Then  the 
two  sorted  sublists  are  merged. 

To  write  pseudocode  for  the  Batcher  sort,  let  Coinparator(a;,  replace  x  with  the  smaller  of  x 
and  y  and  replace  y  with  the  larger.  The  Batcher  sort  for  array  xi,. . .  ,Xn  is  as  follows. 

BSORT(xi, . . .  ,Xn) 
If  n  =  1 

Return 


EMERGE  ...,Xn) 
Return 

End 

EMERGE  (xi,  ...,Xn) 
If  n  =  1 

Return 
End  if 
If  n  =  2 

Comparator (xi,  X2) 

Return 
End  if 

EMERGE2(a;i, . . .  ,a;fe;  Xfe+i,...,x„) 
For  i  =  1,2, . . .,  [n/2]  -  1 

Comparator (a;2i , a;2i+i )  /*  Do  these  in  parallel.  */ 

End  for 
Return 

End 

BMERGE2(a;i, . .  .  yi,...,yj) 

EMERGE(xi,  xa,  2:5, j/i,  7/3,2/5, .. .)  /*  Do  these  in  ...  */ 

BMERGE(a;2,a;4,a;6, ...  ,2/2,2/4,2/6,  ••  •)  /*  ...parallel.  */ 


The  first  procedure  is  the  usual  form  for  a  merge  sort.  The  other  two  procedures  are  the  Batcher 
merge.  They  could  be  combined  into  one  procedure,  but  keeping  track  of  subscripts  gets  a  bit  messy. 

The  Batcher  sort  for  eight  items  is  shown  in  Figure  8.6  with  some  parts  labeled.  The  part  labeled 
S2  is  four  copies  of  a  two  item  Batcher  sort.  The  parts  labeled  C„  are  the  comparators  in  EMERGE 
on  a  list  of  n  items.  From  S2  through  C4  inclusive  is  two  copies  of  a  four  item  Batcher  sort,  one  for 
the  top  four  inputs  and  one  for  the  bottom  four.  The  parts  labeled  O  and  E  are  the  odd  and  even 
indexed  entries  being  treated  by  EMERGE  in  the  call  of  EMERGE2  with  an  eight  long  list. 


End  if 

k  =  [n/2] 
ESORT(a;i, . . .  ,Xk) 
ESDRT(.Xfe+i,...,a;„) 


/*  Do  these  in  ...  */ 
/*  ...  parallel.  */ 


Return 


End 
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S2  C4      O        E  Cg 


Figure  8.6   The  Batcher  network  for  eight  items. 


To  prove  that  the  Batcher  sort  works,  it  suffices  to  prove  that  the  merge  part  works.  Why  is 
this?  Any  recursive  sort  with  a  merge  will  work:  The  sequence  is  split  in  two,  each  part  is  sorted 
and  the  two  parts  are  merged. 

A  variation  of  the  Zero-One  Principle  can  be  used  to  prove  that  the  merge  works.  We  need 
only  prove  the  merge  for  all  sequences  of  zeroes  and  ones  for  which  both  the  first  and  second  halves 
have  been  sorted.  We  remark  that  j  <  k  <  j  +  1  whenever  BMERGE2  is  called.  The  idea  of  the  proof 
is  to  use  induction  on  n.  A  key  observation  is  that  the  number  of  zeroes  in  the  two  sequences  that 
BMERGE2  passes  to  EMERGE  are  practically  the  same:  The  number  of  zeroes  in  the  sequence  made 
from  the  odd  subscripted  x's  and  j/'s  less  the  number  of  zeroes  in  the  other  sequence  is  0,  1  or  2. 
One  can  then  consider  the  three  possible  cases  separately.  This  is  left  as  an  exercise. 

Exercises 


8.3.7.  Prove  that  if  the  comparators  in  a  network  only  connect  adjacent  lines  and  if  the  network  correctly 
sorts  all  sequences  that  consist  of  ones  followed  by  zeroes,  then  it  correctly  sorts  all  inputs. 

8.3.8.  (Brick  wall  network  correctness)  This  exercise  is  a  preparation  for  the  one  to  follow. 

(a)  Draw  a  diagram  to  show  how  the  sequence  5,4,3,2,1  moves  through  the  brick  wall  network.  Show 
how  to  cut  out  all  appearances  of  5  and  push  the  network  together  to  obtain  a  diagram  for 
4,3,2,1. 

(b)  Draw  diagrams  to  show  how  the  two  input  sequences  1,1,0  and  1,0,0  move  through  the  brick 
wall. 

(c)  Repeat  the  previous  part  for  the  sequences  1,1,1,0  and  1,1,0,0  and  1,0,0,0. 

8.3.9.  (Brick  wall  network  correctness)  Prove  that  the  brick  wall  correctly  sorts  all  inputs.  You  may  find 
an  idea  for  a  proof  in  the  previous  exercise. 

8.3.10.  Draw  networks  for  Batcher  sorts  for  n  up  to  8. 

8.3.11.  (Batcher  merge  correctness)  Fill  in  the  details  of  the  proof  that  EMERGE  works  when  n  is  a  power  of 
2;  that  is,  if  xi, . . . , are  in  order  and  a;/;_|_i,  ■  ■  ■  ,Xn  are  sorted,  then  EMERGE(a;i, . . . , Xn)  results  in 
a  sorted  list. 

Hint.  In  this  case,  division  always  comes  out  even  and  k  =  j  =  n/2. 

8.3.12.  (Batcher  merge  correctness)  Fill  in  the  details  of  the  proof  that  EMERGE  works  for  all  11;  that  is, 
if  xi, . . .  ,xi~  are  in  order  and  xi~+i , . . .  ,Xn  are  sorted,  then  BMERGE(a;i, . . .  ,Xn)  results  in  a  sorted 
list. 
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8.3.13.  (Batcher  network  time)  Let  S{N)  be  the  number  of  time  units  a  Batcher  network  takes  to  sort  2 
things  and  let  M{N)  be  the  number  of  time  units  a  Batcher  network  takes  to  execute  EMERGE  on  2^ 
things. 

(a)  Prove  that  S{Q)  =  0  and,  for  AT  >  0,  S{N)  <  S{N  -  1)  +  M(JV). 

(b)  Prove  that  M(0)  =  0  and,  for      >  0,  M{N)  <  M{N  -  1)  +  1. 

(c)  Conclude  that  S{N)  <  N{N +  l)/2. 

(d)  What  can  you  say  about  the  time  it  takes  a  Batcher  network  to  sort  n  things  when  n  is  not  a 
power  of  two? 


Notes  and  References 


Most  books  on  combinatorial  algorithms  discuss  software  algorithms  for  sorting  and  a  few  discuss 
hardware  algorithms;  i.e.,  sorting  nets.  You  may  wish  to  look  at  the  references  at  the  end  of  Chapter  6. 
The  classic  reference  is  Knuth  [4]  where  you  can  find  more  details  on  the  various  software  sorts  that 
we've  discussed.  See  also  pp.  47-75  of  Williamson  [5] . 

"Expander  graphs"  are  the  basis  of  sorting  networks  that  require  only  O(nlnn)  comparators.  At 
present,  they  have  not  yet  led  to  any  practical  networks.  For  a  discussion  of  this  topic  see  Chapter  5 
of  the  book  by  Gibbons  and  Rytter  [3].  It  is  possible  to  construct  fairly  cheap  networks  based  on 
a  modified  Batcher  sort.  Like  the  brick  wall,  these  networks  consist  of  a  pattern  repeated  many 
times.  The  brick  wall  is  a  2-long  pattern  repeated  n/2  times.  The  modified  Batcher  network  is  a 
(log2  n)-long  pattern  repeated  (log2  n)  times.  See  [2]  for  details.  See  also  [1].  A  proof  of  Theorem  8.3 
can  be  found  on  p.  63  of  [5]. 
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CHAPTER  9 

Rooted  Plane  Trees 


Introduction 


The  most  important  recursive  definition  in  computer  science  may  be  the  definition  of  an  rooted  plane 
tree  (RP-tree)  given  in  Section  5.4.  For  convenience  and  emphasis,  here  it  is  again. 

Definition  9.1  Rooted  plane  trees  An  RP-tree  consists  of  a  set  of  vertices  each  of 
which  has  a  (possibly  empty)  linearly  ordered  list  of  vertices  associated  with  it  called  its  sons  or 
children.  Exactly  one  of  the  vertices  of  the  tree  is  called  the  root.  Among  all  such  possibilities, 
only  those  produced  in  the  following  manner  are  RP-trees. 

•  A  single  vertex  with  no  sons  is  an  RP-tree.  That  vertex  is  the  root. 

•  If  Ti, . . .  ,Tk  is  an  ordered  list  of  RP-trees  with  roots  n , . . . ,  rfc  and  no  vertices  in  common, 

then  an  RP-trcc  T  can  be  constructed  by  choosing  an  unused  vertex  r  to  be  the  root,  letting 
its  ith  child  be  rj  and  forgetting  that  n,. . .  ,rk  were  called  roots. 

In  Example  7.9  (p.  206)  we  sketched  a  proof  that  this  definition  agrees  with  the  one  given  in  Defi- 
nition 5.12.  Figure  9.1  illustrates  the  last  stage  in  building  an  RP-tree  by  using  our  new  recursive 
definition.  In  that  case  k  =  3. 

Wc  nccid  a  little  more  terminology.  The  RP-trees  Ti , . . . ,      in  the  definition  are  called  the 

principal  subtrees  of  T.  A  vertex  with  no  sons  is  called  a  leaf. 

The  fact  that  a  general  RP-tree  can  be  defined  recursively  opens  up  the  possibility  of  recursive 
algorithms  for  manipulating  general  classes  of  RP-trees.  Of  course,  we  are  frequently  interested  in 
special  classes  of  RP-trees.  Those  special  classes  which  can  be  defined  recursively  are  often  the  most 
powerful  and  elegant.  Here  are  three  such  classes. 

•  In  Example  7.14  (p.  213)  we  saw  how  a  local  description  could  be  associated  with  a  recursive 
algorithm.  In  Example  7.16  (p.  214)  a  local  description  was  expanded  into  a  tree  for  the  Tower 
of  Hanoi  procedure.  These  local  descriptions  are  simply  recursive  descriptions  of  RP-trees  that 
describe  the  algorithms.  A  leaf  is  the  "output"  of  the  algorithm.  In  these  two  examples,  a  leaf 
is  either  a  permutation  or  the  movement  of  a  single  washer,  respectively.  In  other  words: 

A  recursive  algorithm 
contains 
a  local  description 
which  is 

a  recursive  definition  of  some  RP-trees. 
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Figure  9.1  The  last  stage  in  recursively  building  an  RP-tree.  Left:  The  trees  Ti,  T2  and  T3  with  roots  b, 
c  and  d.  Right:  The  new  tree  T  with  root  a. 


In  Section  9.1,  we'll  look  at  some  recursive  algorithms  for  traversing  RP-trees. 

•  Compilers  are  an  important  aspect  of  computer  science.  Associated  with  a  statement  in  a  lan- 
guage is  a  "parse  tree."  This  is  an  RP-tree  in  which  the  leaves  are  the  parts  of  the  language  that 

you  actually  see  and  the  other  vertices  are  grammatical  terms.  "Context  free"  grammars  are  re- 
cursively defined  and  lead  to  recursively  defined  parse  trees.  In  Section  9.2  we'll  briefly  look  at 
some  simple  parse  trees. 

•  Those  RP-trees  in  which  each  vertex  has  either  zero  or  two  sons  are  called  full  binary  RP-trees. 
We'll  study  them  in  Section  9.3,  with  emphasis  on  ranking  and  unranking  them.  (Ranking  and 
unranking  were  studied  in  Section  3.2.)  Since  the  trees  are  defined  recursively,  so  is  their  rank 
function. 

Except  for  a  reference  in  Example  9.12  (p.  263),  the  sections  of  this  chapter  can  be  read  independently 
of  one  another. 


9.1   Traversing  Trees 


A  tree  traversal  algorithm  is  a  systematic  method  for  visiting  all  the  vertices  in  an  RP-tree.  We've 
already  seen  a  nonrecursive  traversal  algorithm  in  Theorem  3.5  (p.  85).  As  we  shall  soon  see  a 
recursive  description  is  much  simpler.  It  is  essentially  a  local  description. 

Traversal  algorithms  fall  into  two  categories  called  "breadth  first"  and  "depth  first,"  with  depth 
first  being  the  more  common  type.  After  explaining  the  categories,  we'll  focus  on  depth  first  algo- 
rithms. 

The  left  side  of  Figure  9.2  shows  an  RP-tree.  Consider  the  right  side  of  Figure  9.2.  There  we 
see  the  same  RP-tree.  Take  your  pencil  and,  starting  at  the  root  a,  follow  the  arrows  in  such  a  way 
that  you  visit  the  vertices  in  the  order 

abebfjfkflfbacadgdhdida.  9.1 

This  manner  of  traversing  the  tree  diagram,  which  extends  in  an  obvious  manner  to  any  RP-tree  T, 
is  called  a  depth-first  traversal  of  the  ordered  rooted  tree  T.  The  sequence  of  vertices  (9.1)  associated 
with  the  depth-first  traversal  of  the  RP-tree  T  will  be  called  the  depth-first  vertex  sequence  of  T 
and  will  be  denoted  by  DFV(r).  If  you  do  a  depth- first  traversal  of  an  RP-trcn^  T  and  list  the  edges 
encountered  (list  an  edge  each  time  your  pencil  passes  its  midpoint  in  the  diagram) ,  you  obtain  the 
depth-first  edge  sequence  of  T,  denoted  by  DFE(T).  In  Figure  9.2,  the  sequence  DFE(T)  is 

{a,b}  {b,e}  {b,e}  {b,  f}  {f,j}  {f,j}  {f,k}  {f,k}  {f,l}  {/,/}  {b,f} 
{a,b}  {a,c}  {a,c}  {a,d}  {d,g}  {d,g}  {d,h}  {d,h}  {d,i}  {d,i}  {a,d}. 
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Figure  9.2     Left:  An  RP-tree  with  root  a.  Right:  Arrows  show  depth  first  traversal  of  the  tree. 


The  other  important  linear  order  associated  with  RP-trecs  is  called  breadth-first  order.  This  order 
is  obtained,  in  the  case  of  Figure  9.2,  by  reading  the  vertices  or  edges  level  by  level,  starting  with 
the  root.  In  the  case  of  vertices,  we  obtain  the  breadth-first  vertex  sequence  (BFV(T)).  In  Figure  9.2, 
BFV(T)  =  abcdefghijkl.  Similarly,  we  can  define  the  breadth-first  edge  sequence  (BFE(T)). 

Although  we  have  defined  these  orders  for  trees,  the  ideas  can  be  extended  to  other  graphs. 
For  example,  one  can  use  a  breadth  first  search  to  find  the  shortest  (least  number  of  choices)  route 
out  of  a  maze:  Construct  a  decision  tree  in  which  each  vertex  corresponds  to  an  intersection  in  the 
maze.  (More  than  one  vertex  may  correspond  to  the  same  intersection.)  A  vertex  corresponding 
to  an  intersection  already  encountered  in  the  breadth  first  search  has  no  sons.  The;  dcxnsions  at  an 
intersection  not  previously  encountered  are  all  possibilities  of  the  form  "follow  a  passage  to  the  next 
intersection." 

Example  9.1  Data  structures  for  tree  traversals  Depth-first  and  breadth-first  traversal 
have  data  structures  naturally  associated  with  their  computer  implementations. 

BFV(T)  can  be  implemented  by  using  a  queue.  A  queue  is  a  list  from  which  items  arc  removed 
at  the  opposite  end  from  which  they  are  added  (first  in,  first  out).  Checkout  lines  at  markets  are 
queues.  The  root  of  the  tree  is  listed  and  placed  on  the  queue.  As  long  as  the  queue  is  not  empty, 
remove  the  next  vertex  from  the  queue  and  place  its  sons  on  the  queue.  You  should  be  able  to  modify 
this  to  give  BFE(r) 

DFV(T)  can  be  implemented  by  using  a  stack.  A  stack  is  a  list  from  which  items  are  removed  at 
the  same  end  to  which  they  are  added  (last  in,  first  out).  They  are  used  in  computer  programming 
to  implement  recursive  code.  (See  Example  7.17  (p.  216).)  For  DFV,  the  root  of  the  tree  is  listed 
and  placed  on  the  stack.  As  long  as  the  stack  is  not  empty,  remove  the  vertex  that  is  on  the  top  of 
the  stack  from  the  stack  and  place  its  sons,  in  order,  on  the  stack  so  that  leftmost  son  is  on  the  top 
of  the  stack. 

When  a  vertex  is  added  to  or  removed  from  the  data  structure,  you  may  want  to  take  such 
action;  otherwise,  you  have  will  have  traversed  the  tree  without  accomplishing  anything. 

Depth  First  Traversals 


In  a  tree  traversal,  we  often  want  to  process  either  the  vertices  or  edges  and  do  so  either  the  first  or 
last  time  we  encounter  them.  If  you  process  something  only  the  first  time  it  is  encountered,  this  is 
a  preorder  traversal;  if  you  list  it  only  the  last  time  it  is  encountered,  this  is  a  postorder  traversal; 
This  leads  to  four  concepts: 


PREV(T) 
POSTV(r) 
PREE(r) 
POSTE(T) 


preorrler  vertex  sequence; 
postorder  vertex  sequence; 
preorder  edge  sequence; 
postorder  edge  sequence. 
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Here's  the  promised  recursive  algorithm  for  depth  first  traversal  of  a  tree.  The  sequenees  PREV, 
POSTV,  PREE  and  POSTE  are  initialized  to  empty.  They  are  "global  variables,"  so  all  levels  of 
the  recursive  call  are  working  with  the  same  four  sequences. 

Procedure  DFT(T) 

Let  r  be  the  root  of  T 

Append  vertex  r  to  PREV  /*  PREV  */ 

Let  k  be  the  number  of  principal  subtrees  of  T 
/*  By  convention,  the  For  loop  is  skipped  if  k  =  0.  */ 
For  i  =  1,2, . . .  ,k 

Let  Tj  be  the  ith.  principal  subtree  of  T 

Let  r,  be  the  root  of  T, 

Append  edge  {r,       to  PREE  /*  PREE  */ 

DFT(Ti) 

Append  edge  {r.r,;}  to  POSTE  /*  PDSTE  */ 

Append  vertex  r  to  POSTV  /*  POSTV  */ 

End  for 
Return 

End 

For  example,  in  Figure  9.2,  PREV(T)  =  abefjklcdghi  and  POSTV(T)  =  ejklfbcghida.  Our  pseu- 
docode is  easily  modified  to  do  these  traversals:  Simply  cross  out  those  "List"  lines  whose  comments 
refer  to  traversals  you  don't  want. 

When  a  tree  is  being  traversed,  the  programmer  normally  does  not  want  a  list  of  the  vertices 
or  edges.  Instead,  he  wants  to  take  some  sort  of  action.  Thus  "Append  vertex  v  ..."  and  "Append 
edge  {u,v}  . . ."  would  probably  be  replaced  by  something  like  "DoVERTEX(i>)"  and  "DoEDGE(w, i;)," 
respectively. 

Example  9.2  Reconstructing  trees  Does  a  sequence  like  PREV(T)  have  enough  informa- 
tion to  reconstruct  the  tree  T  from  the  sequence?  Of  course,  one  might  replace  PREV  with  other 

possibilities  such  as  POSTV. 

The  answer  is  "no" .  One  way  to  show  this  is  by  finding  two  trees  T  and  U  with  PREV(T)  = 
PREV(?7)  —  or  using  whatever  other  possibility  one  wants  in  place  of  PREV.  The  trouble  with  this 
approach  is  that  we  need  a  new  example  for  each  case.  We'll  use  the  Pigeonhole  Principle  (p.  55) 
to  give  a  proof  that  is  easily  adapted  to  other  situations.  Given  a  set  V  of  n  labels  for  the  vertices, 
there  are  n!  possible  sequences  for  PREV  since  each  such  sequence  must  be  a  permutation  of  V. 
Let  B  be  the  set  of  these  permutations  and  let  A  be  the  set  of  RP-trees  with  vertex  set  V.  Then 
PREV  :  A  ^  B  and  we  want  to  show  that  two  elements  of  A  map  to  the  same  permutation.  By  the 
Pigeonhole  Principle,  it  suffices  to  show  that  \A\  >  \B\.  We  already  know  that  \B\  =  n\  There  are 
n"~^  trees  by  Example  5.10  (p.  143).  Since  many  RP-trees  may  give  the  same  tree  (root  and  order 
information  is  removed),  we  have  |j4|  >  n"^^.  We  need  to  show  that  nl  <  n"~^  for  some  value  of  n. 
This  can  be  done  by  finding  such  an  n  or  by  using  Stirling's  Formula,  Theorem  1.5  (p.  12).  We  leave 
this  to  you.  D 
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Example  9.3  Graph  traversal  and  spanning  trees  One  can  do  depth  first  traversal  of  a 
graph  to  construct  a  lineal  spanning  tree  a  concept  defined  in  Definition  6.2  (p.  153).  The  following 
algorithm  finds  a  lineal  spanning  tree  with  root  r  for  a  connected  simple  graph  G  =  {V,E).  The 
graph  is  a  "global  variable,"  so  changes  to  it  affect  all  the  levels  of  the  recursive  calls.  When  a  vertex 
is  removed  from  G,  so  are  the  incident  edges.  The  comments  refer  to  the  proof  of  Theorem  6.2 
(p.  153).  We  leave  it  to  you  to  prove  that  the  algorithm  does  follow  the  proof  as  claimed  in  the 
comments. 

/*  Generate  a  lineal  spanning  tree  of  G  rooted  at  r.  */ 
LST(r) 

If  (no  edges  contain  r) 

Remove  r  from  G 

Return  the  1  vertex  tree  with  root  r 
End  if 

/*  f  =  {r,  s}  is  as  in  the  proof .  */ 
Choose  {r,  s}  &  E 

Remove  r  from  G,  saving  it  and  its  incident  edges 
/*  S  corresponds  to  T{A)  in  the  proof.  */ 
S  =  LST(s) 

Restore  r  and  the  saved  edges  whose  ends  are  still  in  G 
/*  R  corresponds  to  T{B)  in  the  proof.  */ 
R  =  LST(r) 

Join  S  to      by  an  edge  {r,  s}  to  obtain  a  new  tree  T 
Remove  r  from  G 
Return  T 
End  □ 

Example  9.4  Counting  RP-trees  When  doing  a  depth  first  traversal  of  an  unlabeled  RP-trcc, 
imagine  listing  the  direction  of  each  step:  a  for  away  from  the  root  and  t  for  toward  the  root.  What 
can  we  see  in  this  sequence  of  a's  and  t's? 

Each  edge  contributes  one  a  and  one  t  since  it  is  traversed  in  each  direction  once.  If  the  tree 
has  n  edges,  we  get  a  2n-long  sequence  containing  n  copies  of  a  and  n  copies  of  t,  say  Si, . . . ,  S2n-  If 
si, . . . ,  Sfe  contains  dk  more  o's  than  t's,  we  will  be  a  distance  dk  from  the  root  after  k  steps  because 
we've  taken  d  more  steps  away  from  the  root  than  toward  it.  In  particular  dk  >  0  for  all  k. 

Thus  a  tree  with  n  edges  determines  a  unique  sequence  of  n  a's  and  n  fs  in  which  each  initial 
part  of  the  sequence  contains  at  least  as  many  o's  as  t's.  Furthermore,  you  should  be  able  to  see  how 
to  reconstruct  the  tree  if  you  arc  given  such  a  sequence.  Thus  there  is  a  bijcction  between  the  trees 
and  the  sequences.  It  follows  from  the  first  paragraph  of  Example  1.13  (p.  15)  that  the  number  of 
n-edge  unlabeled  RP-trees  is  C„,  the  Catalan  number. 

Since  a  tree  with  n  vertices  has  n  —  1  edges,  it  follows  that  the  number  of  n-vcrtcx  imlabolcd 
RP-trees  is  C„_i.  By  Exercise  9.3.12  (p.  266),  it  follows  that  the  number  of  unlabeled  binary  RP-trees 
with  n  leaves  is  C„_i.  Thus  the  solution  to  Exercise  9.3.13  provides  a  formula  for  the  Catalan 
numbers.  Q 

Exercises 

9.1.1.  Write  pseudocode  for  recursive  algorithms  for  PREV(r),  PREE(r)  and  POSTE(r). 

9.1.2.  In  the  case  of  decision  trees  we  arc  only  interested  in  visiting  the  leaves  of  the  tree.  Call  the  resulting 
sequence  DFL(T)  Write  pseudocode  for  a  recursive  algorithm  for  DFL(T).  What  is  the  connection 
with  the  traversal  algorithm  in  Theorem  3.5  (p.  85)? 
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9.1.3.  Write  pseudocode  to  implement  breadth  first  traversal  using  a  queue.  Call  the  operations  for  adding 
to  the  queue  and  removing  from  the  queue  INQUEUE  and  OUTQUEUE,  respectively. 

9.1.4.  Write  pseudocode  to  implement  PREV(T)  using  a  stack.  Call  the  operations  for  adding  to  the  stack 
and  removing  from  the  stack  PUSH  and  POP,  respectively. 

9.1.5.  Construct  a  careful  proof  that  the  algorithm  in  Example  9.3  does  indeed  construct  a  lineal  spanning 
tree. 

9.1.6.  In  contrast  to  Example  9.2,  if  PREV(T)  =  PREV(?7)  and  POSTV(r)  =  POSTV(?7),  then  T  =  U. 
Another  way  to  state  this  is: 

Given  PREV(T)  and  POSTV(T),  we  can  reconstruct  T.  9.2 

The  goal  of  this  exercise  is  to  prove  (9.2)  by  induction  on  the  number  of  vertices. 

Let  n  =  |T|,  the  number  of  vertices  of  T.  Let  v  be  the  root  of  T.  Let  PREV(T)  =  oi, . . . ,  On  and 
POSTV(T)  =  «!,...,«„. 

If  n  >  1,  we  may  suppose  that  T  consists  of  the  RP-trees  Ti, . . . ,  T/j  joined  to  the  root  v.  Let  U 
be  the  RP-tree  with  root  v  joined  to  the  trees  T2, . . .  ,T/j.  (If  fc  =  1,  Z7  is  just  the  single  vertex  v.) 

(a)  Prove  (9.2)  is  true  when  n  =  1. 

(b)  Prove  that  02  is  the  root  of  Ti 

(c)  Suppose  zt  —  a2.  (This  must  be  true  for  some  t  since  PREV  and  POSTV  are  both  permutations 

of  the  vertex  set.)  Prove  that  POSTV(ri)  =  zi,...,zt. 

(d)  With  t  as  above,  prove  that  PREV(Ti)  =  a2, . . . ,  at+i- 
Hint.  How  many  vertices  are  there  in  Ti? 

(e)  Prove  that  PREV(t/)  =  v,  ot+2, . . . ,  o„  and  P0STV(J7)  =  zt+i,  ■■■,Zn- 

(f)  Complete  the  proof. 

9.1.7.  In  Example  9.2,  we  proved  that  a  tree  cannot  be  reconstructed  from  the  sequence  PREV.  The  proof 
also  works  for  POSTV. 

(a)  Find  two  different  RP-trees  T  and  U  with  PREV(T)  =  PREV(?7). 

(b)  Find  two  different  RP-trees  T  and  U  with  POSTV(T)  =  POSTV(f/). 

9.1.8.  Use  Exercise  9.1.6  to  write  pseudocode  for  a  recursive  procedure  to  reconstruct  a  tree  from  PREV 
and  POSTV. 

9.1.9.  If  T  is  an  unlabeled  RP-tree,  define  a  sequence  D{T)  of  ±l's  as  follows.  Perform  a  depth  first  traversal 
of  the  tree.  Whenever  an  edge  is  followed  from  father  to  son,  list  a  -|-1.  Whenever  an  edge  is  followed 

from  son  to  father,  list  a  —1. 

(a)  Let  Ti , . . . ,  Tm  be  unlabeled  RP-trees  and  let  T  be  the  tree  formed  by  joining  the  roots  of  each 
of  the  Tj's  to  a  new  root  to  form  a  new  unlabeled  RP-tree.  Express  D{T)  in  terms  of  D{Ti),  . . ., 
D{Tm). 

(b)  IF  D{T)  —  si , . . . ,  s„ ,  show  that  n  is  twice  the  number  of  edges  of  T  and  show  that  2_/ j=i      ^  0 

for  all  k  with  equality  when  k  —  n. 

*(c)  Let  s  =  si, . . . ,  Sn  be  a  sequence  of  il's.  Show  that  s  comes  from  at  most  one  tree  and  that  it 
comes  from  a  tree  if  and  only  if  X^^^j^  Sj  >  0  for  all  k,  with  equality  when  k  =  n. 
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(a)  The  first  step.  (b)  An  intermediate  step. 

Figure  9.3    Interpreting  the  expression  (A  +  5)  *  (2  +  (3  *  X)) 
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(c)  The  final  result. 


9.2   Grammars  and  RP-Trees 


Languages  are  important  in  computer  science.  Compilers  convert  programming  languages  into  ma- 
chine code.  Automatic  translation  projects  are  entangled  by  the  intricacies  of  natural  languages. 
Researchers  in  artificial  intelligence  arc  interested  in  how  people  manage  to  understand  language. 
We  look  briefly  at  a  simple,  small  part  of  all  this:  "context-free  grammars"  and  "parse  trees."  To 
provide  some  background  material,  we'll  look  at  arithmetic  expressions  and  at  simple  sentences. 

Unfortunately,  we'll  be  introducing  quit  a  bit  of  terminology.  Fortunately,  you  won't  need  most 
of  it  for  later  sections,  so,  if  you  forget  it,  you  can  simply  look  up  what  you  need  in  the  index  at  the 
time  it  is  needed. 

Example  9.5    Arithmetic  expressions     Let's  consider  the  meaning  of  the  expression 
{A  +  5)*{2  +  {3*X)). 

It  means  ^4  +  5  times  2  -f-  (3  *  X),  which  we  can  represent  by  the  RP-trec  in  Figure  9.3(a).  We 
can  then  interpret  the  subexpressions  and  replace  them  by  their  interpretations  and  so  on  until  we 
obtain  Figure  9.3(c).  We  can  represent  this  recursive  procedure  as  follows,  where  "eccp"  is  short  for 
"expression" . 

INTERPRET  (ea;p) 

If  (.exp  has  no  operation) 

Return  the  RP-tree  with  root  exp 
and  no  other  vertices. 

End  if 

Let  exp  =  {expi  op  expr) . 

Return  the  RP-tree  with  root  op, 

left  principal  subtree  INTERPRET (e.Tp()  and 

right  principal  subtree  INTERPRET (expr)  • 

End 

Now  suppose  we  wish  to  evaluate  the  expression,  with  a  procedure  we'll  call  EVALUATE.  One 
way  of  doing  that  would  be  to  modify  INTERPRET  slightly  so  that  it  returns  values  instead  of  trees. 
A  less  obvious  method  is  to  traverse  the  tree  generated  by  INTERPRET  in  POSTV  order.  Each  leaf 
is  replaced  by  its  value  and  each  nonleaf  is  replaced  by  the  value  of  performing  the  operation  it 
indicates  on  the  values  of  its  two  sons. 

To  iUustrate  this,  POSTV  of  Figure  9.3(c)  is^5  +  23X*+*.  Thus  we  replace  the  leaves 
labeled  A  and  5  with  their  values,  then  the  leftmost  vertex  labeled  +  with  the  value  of  ^  +  5,  and 
so  on.  POSTV  of  a  tree  associated  with  calculations  is  called  postorder  notation  or  reverse  Polish 
notation.  It  is  used  in  some  computer  languages  such  as  Forth  and  PostScript®  and  in  some  handheld 
calculators.  D 
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Example  9.6    Very  simple  English    Languages  arc  very  complicated  creations;  however,  we  can 

describe  a  rather  simple  form  of  an  English  sentence  as  follows: 

(a)  One  type  sentence  consists  of  the  three  parts  noun-phrase,  verb,  noun-phrase  and  a  period 
in  that  order. 

(b)  A  noun-phrase  is  either  a  noun  or  an  adjective  followed  by  a  noun-phrase. 

(c)  Lists  of  acceptable  words  to  use  in  place  of  verb,  noun  and  adjective  are  available. 

This  description  is  a  gross  oversimplification.  It  could  lead  to  such  sentences  as  "big  brown 
small  houses  sees  green  green  boys."  The  disagreement  between  subject  and  verb  (plural  versus 
singular)  could  be  fixed  rather  easily,  but  the  nonsense  nature  of  the  sentence  is  not  so  easily  fixed. 
If  we  agree  to  distinguish  between  grammatical  correctness  and  content,  we  can  avoid  this  problem. 
(As  we  shall  see  in  a  little  while,  this  is  not  merely  a  way  of  defining  our  difficulty  out  of  existence.)  D 

Let's  rephrase  our  rules  from  the  last  example  in  more  concise  form.  Here's  one  way  to  do  that. 

(a)  sentence          noun-phrase  verb  noun-phrase  . 

(b)  noun-phrase  — >    adjective  noun-phrase     |  noun 

(c)  adjective  — >    big    |    small    |    green  | 
noun  houses    |    boys  | 

verb  sees  | 

These  rules  along  with  the  fact  that  we  are  interested  in  sentences  are  what  is  called  a  "context-free 
grammar." 

Definition  9.2   Context-free  grammar    A  context-free  grammar  consists  of 

1.  a  finite  set  S  of  nonterminals; 

2.  a  finite  set  T  of  terminals; 

3.  a  start  symbol  sq  €  S; 

4.  a  finite  set  of  productions  of  tlie  form   s  ^  Xi . .  .Xn  where  s  G  S  and  Xi  &  S  U  T.  We 

allow  n  ~  0,  in  which  case  the  production  is  "s  —>■." 

We'll  distinguish  between  terminal  and  nonterminal  symbols  by  writing  them  in  the  fonts  "ter- 
minal" and  "nonterminal."  If  we  don't  know  whether  or  not  an  element  is  terminal,  we  will  use 
the  nonterminal  font.  (This  can  happen  in  a  statement  like,  xG  SUT.) 

In  the  previous  example,  the  start  symbol  is  sentence  and  the  productions  are  given  in  (a)-(c). 

The  productions  of  the  grammar  give  the  structural  rules  for  building  the  language  associated 
with  the  grammar.  This  set  of  rules  is  called  the  syntax  of  the  language.  The  grammar  is  called 
context-free  because  the  productions  do  not  depend  on  the  context  (surroundings)  in  which  the 
nonterminal  symbol  appears.  Natural  languages  are  not  context-free,  but  many  computer  languages 
are  nearly  context-free. 

Any  string  of  symbols  that  can  be  obtained  from  the  start  symbol  {sentence  above)  by  repeated 
substitutions  using  the  productions  is  called  a  sentential  form. 

Definition  9.3  The  language  of  a  grammar  Asentential  form  consisting  only  of  terminal 
symbols  is  a  sentence.  The  set  of  sentences  associated  with  a  grammar  G  is  called  the  language 
of  the  grammar  and  is  denoted  L{G). 
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noun-phrase 

adjective  noun-phrase 

little     adjective  noun-phrase 
I  I 
red  noun 

I 

houses 

Figure  9.4   The  parse  tree  for  the  sentence  little  red  houses. 


Processes  that  obtain  one  string  of  symbols  from  another  by  repeated  appHcations  of  productions 
are  called  derivations.  To  indicate  that  little  red  noun  can  be  derived  from  noun-phrase,  we  write 

noun-phrase    =1-    little  red  noun. 

Thus 

sentence    =5>    big  brown  small  houses  sees  green  green  boys. 

We  can  represent  productions  by  RP-trees  where  a  vertex  is  the  left  side  of  a  production  and 
the  leaves  are  the  items  on  the  right  side;  for  example, 

noun-phrase  adjective  noun-phrase 

becomes 


noun-phrase 


adjective  noun-phrase 

The  collection  of  productions  thus  become  local  descriptions  for  trees  corresponding  to  derivations 
such  as  that  shown  in  Figure  9.4.  We  call  such  a  tree  a  parse  tree.  The  string  that  was  derived  from 
the  root  of  the  parse  tree  is  the  DFV  sequence  of  the  tree.  A  sentence  corresponds  to  a  parse  tree 
in  which  the  root  is  the  start  symbol  and  all  the  leaves  are  terminal  symbols. 

How  is  Figure  9.3  related  to  parse  trees?  There  certainly  seems  to  be  some  similarity,  but  all 
the  symbols  are  terminals.  What  has  happened  is  that  the  parse  tree  has  been  squeezed  down  to 
eliminate  unnecessary  information.  Before  we  can  think  of  Figure  9.3  as  coming  from  a  parse  tree, 
we  need  to  know  what  the  grammar  is.  Here's  a  possibility,  with  exp  (short  for  expression)  the  start 
symbol,  number  standing  for  any  number  and  id  standing  for  any  identifier,  i.e.,  the  name  of  a 
variable. 

exp  — »    term    \    term  op  term 
term          (  exp )    |    id    |  number 
op    ^    +    \    -    \    *    \  / 

A  computer  language  has  a  grammar.  The  main  purpose  of  a  compiler  is  to  translate  gram- 
matically correct  code  into  machine  code.  A  secondary  purpose  is  to  give  the  programmer  useful 
messages  when  nongrammatical  material  is  encountered.  Whether  grammatically  correct  statements 
make  sense  in  the  context  of  what  the  programmer  wishes  to  do  is  beyond  the  ken  of  a  compiler; 
that  is,  a  compiler  is  concerned  with  syntax,  not  with  content. 

Compilers  must  use  grammars  backwards  from  the  way  we've  been  doing  it.  Suppose  you  are 
functioning  as  a  general  purpose  compiler.  You  are  given  the  grammar  and  a  string  of  terminal 
symbols.  Instead  of  starting  with  the  start  symbol,  you  must  start  with  the  string  of  terminals  and 
works  backwards,  creating  the  parse  tree  from  the  terminals  at  the  leaves  back  to  the  start  symbol  at 
the  root.  This  is  called  parsing.  A  good  compiler  must  be  able  to  carry  out  this  process  or  something 
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like  it  quite  rapidly.  For  this  reason,  attention  has  focused  on  grammars  which  are  quickly  parsed 
and  yet  flexible  enough  to  be  useful  as  computer  languages. 

When  concerned  with  parsing,  one  should  read  the  productions  backwards.  For  example, 

term          (  exp )    |    id    |  number 

would  be  read  as  the  three  statements: 

•  If  one  sees  the  string  "(  exp  )",  it  may  be  thought  of  as  a  term. 

•  If  one  sees  an  id,  it  may  be  thought  of  as  a  term. 

•  If  one  sees  a  number,  it  may  be  thought  of  as  a  term,. 

Example  9.7  Arithmetic  expressions  again  Our  opening  example  on  arithmetic  expressions 
had  a  serious  deficiency:  Parentheses  were  required  to  group  things.  We  would  like  a  grammar  that 
would  obey  the  usual  rules  of  mathematics:  Multiplication  and  division  take  precedence  over  addition 
and  subtraction  and,  in  the  event  of  a  draw,  operations  are  performed  from  left  to  right. 

We  can  distinguish  between  factors,  terms  (products  of  factors)  and  expressions  (sums  of  terms) 
to  enforce  the  required  precedence.  We  can  enforce  the  left  to  right  rule  by  one  of  two  methods: 

(a)  Wc  can  build  it  into  the  syntax. 

(b)  Wc  can  insist  that  in  the  event  of  an  ambiguity  the  leftmost  operation  should  be  performed. 

The  latter  idea  has  important  ramifications  that  make  parsing  easier.  You'll  learn  about  that  when 
you  study  compilers.  We'll  use  method  (a).  Here's  the  productions,  with  exp  the  start  symbol. 

exp  — »■    term    \    exp  +  term    \    exp  —  term 
term  factor    \    term  *  factor    \    term  /  factor 

factor  (  exp )    |    id    |  number 

As  you  can  see,  even  this  a  language  fragment  is  a  bit  complicated.  Q 

By  altering  (4)  of  Definition  9.2  (p.  254),  we  can  get  other  types  of  grammars.  A  more  general 

replacement  rule  is 

4'.  a  finite  set  of  productions  of  the  form  vi . .  .Vm       xi . .  .Xn  where  Vi,Xi  £  S  (J  T ,  m  >  1  and 
n  >  0. 

This  gives  what  are  called  phrase  structure  grammars.  Grammars  that  are  this  general  are  hard  to 
handle. 

A  much  more  restrictive  replacement  rule  is 

4".  a  finite  set  of  productions  of  the  form  5  ^  ti . . .  tn  or  s  ^  ti . . .  tj^r  and  where  r,s  £  5,  n  >  0 
and  ti  G  T. 

This  gives  what  are  called  regular  grammars,  which  are  particularly  easy  to  study. 

Example  9.8  Finite  automata  and  regular  grammars  In  Section  6.6  we  studied  finite  au- 
tomata and,  briefly,  Turing  machines.  If  ^  is  a  finite  automaton,  the  language  of  A,  L{A),  is  the 
set  of  all  input  sequences  that  are  recognized  by  A.  With  a  proper  definition  of  recognition  for  Tur- 
ing machines,  the  languages  of  phrase  structure  grammars  are  precisely  the  languages  recognized  by 
Turing  machines.  We  will  prove  the  analogous  result  for  regular  grammars: 

Theorem  9.1  Regular  Grammars  Let  L  be  a  set  of  strings  of  symbols.  There  is  a 
regular  grammar  G  with  L{G)  =  L  if  and  only  if  there  is  a  finite  automaton  A  with  L  =  L{A). 
In  other  words,  the  languages  of  regular  grammars  are  precisely  the  languages  recognized  by 
finite  automata. 
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Proof:  Suppose  we  are  given  an  automaton  A  =  {S,  I,  /,  so,A).  We  must  exhibit  a  regular  grammar 
G  with  L{G)  =  L{A).  Here  it  is. 

1.  The  set  of  nonterminal  symbols  of  G  is  S,  the  set  of  states  of  A. 

2.  The  set  of  terminal  symbols  of  G  is  /,  set  of  input  symbols  of  A. 

3.  The  start  symbol  of  G  is  sq,  the  start  symbol  of  A. 

4.  The  productions  of  G  arc  all  things  of  the  form  s      it  ot  s     j  where  f{s,  i)  =  t  or  f{s,j)  G  A, 

the  accepting  states  of  A. 

Clearly  G  is  a  regular  grammar.  It  is  not  hard  to  sec  that  L{G)  =  L{A). 

Now  suppose  we  are  given  a  regular  grammar  G.  We  must  exhibit  an  automaton  A  with 
L{A)  =  L{G).  Our  proof  is  in  three  steps.  First  we  show  that  there  is  a  regular  grammar  G' 
with  L{G')  =  L{G)  and  with  no  productions  of  the  form  s  t.  Second  wc  show  that  there  is  a  reg- 
ular grammar  G"  with  L(G")  —  L{G')  in  which  the  right  side  of  all  productions  are  either  empty 
or  of  the  form  is,  that  is  i  is  terminal  and  s  is  nonterminal.  Finally  we  show  that  there  is  a  nonde- 
terministic  finite  automaton  M  that  recognizes  precisely  L(G").  By  the  "no  free  will"  theorem  in 
Section  6.6,  this  will  complete  the  proof. 

First  step:  Suppose  that  G  contains  productions  of  the  form  s  ^  t.  We  will  construct  a  new  regular 
grammar  G'  with  no  productions  of  this  form  and  L{G)  =  L(G').  (If  there  are  no  such  productions, 
simply  let  G'  =  G.)  The  terminal,  nonterminal  and  start  symbols  of  G'  are  the  same  as  those  of  G. 
Let  R  be  the  right  side  of  a  production  in  G  that  is  not  simply  a  nonterminal  symbol.  We  will  let 
s  — >  i?  be  a  production  in  G'  if  and  only  if 

•  5  — >  i?  is  a  production  in  G,  or 

•  There  exist  Xi,. . .  ,Xn  such  that  s  ^  Xi,  Xi  ^  X2,  ■  ■  ■ ,  Xn-i  — »  Xn  and  Xn  ^  R  are  all  productions 
in  G. 

You  should  be  able  to  convince  yourself  that  L{G')  —  L{G). 

Second  step:  We  now  construct  a  regular  grammar  G"  in  which  the  right  side  of  every  production 
is  either  empty  or  has  exactly  one  terminal  symbol.  The  terminal  symbols  of  G"  are  the  same  as 
those  of  G'.  The  nonterminal  symbols  of  G"  are  those  of  G'  plus  some  additional  ones  which  we'll 
define  shortly. 

Let  s      ii . . .  i„t  be  a  production  in  G'.  If  n  =  1,  this  is  also  a  production  in  G".  By  the 

construction  of  G',  wc  cannot  have  n  =  0,  so  wc  can  assume  n  >  1.  Let  a  stand  for  the  right  side  of 
the  production.  Introduce  n  —  1  new  states  (s,  a,  k)  for  2  <  k  <  n.  Let  G"  contain  the  productions 

•  ii(s,  (T,  2); 

•  (s,  a,  k)      ife(s,  cr.  A;  +  1),  for  1  <  fc  <  n  (There  are  none  of  these  if  n  =  2.); 

•  (5,(7,  n)  — »•  int. 

Let  s  — >  ii . . .  i„  be  a  production  in  G'.  (An  empty  right  hand  side  corresponds  to  n  =  0.)  If 
n  =  0  or  n  =  1,  let  this  be  a  production  in  G";  otherwise,  use  the  idea  of  the  previous  paragraph, 
omitting  t.  You  should  convince  yourself  that  L{G")  =  L{G'). 

Third  step:  We  now  construct  a  nondeterministic  finite  automaton  M  that  recognizes  precisely  L{G). 
The  states  of  Af  are  the  internal  symbols  of  G"  together  with  a  new  state  a,  the  start  state  is  the 
start  symbol,  and  the  input  symbols  are  the  terminal  symbols.  Let  o  be  an  accepting  state  of  Af. 
Let  s     i?  be  a  production  of  G".  There  are  three  possible  forms  for  R: 

•  If  i?  is  empty,  then  s  is  an  accepting  state  of  Af. 

•  If  i?  =  i,  then  {s,  i,  a)  is  an  edge  of  Af. 

•  if  i?  =  it,  then  (s,  i,  t)  is  an  edge  ofAf. 

You  should  convince  yourself  that  Af  accepts  precisely  L{G").  Q 
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Exercises 

9.2.1.  Draw  the  trees  hke  Figure  9.3(c)  to  interpret  the  following  expressions. 

(a)  (((1  +  2) +3) +4) +  5 

(b)  ((l  +  2)  +  (3  +  4))  +  5 

(c)  l  +  (2  +  (3  +  (4  +  5))) 

(d)  (X  +  f>*Y)/{X  -  Y) 

(e)  (X*y-3)  +  X*(F+l) 

9.2.2.  How  might  the  ideas  in  Example  9.5  be  modified  to  allow  for  unary  minus  as  in  {—A)  *  Bl 

9.2.3.  Write  pseudocode  for  the  two  methods  suggested  in  Example  9.5  for  calculating  the  value  of  an 
arithmetic  expression. 

9.2.4.  Using  the  grammar  of  Example  9.7,  construct  parse  trees  for  the  following  sentences. 

(a)  (x  +  5*y)/(x -y) 

(b)  (x*y-3)  +  x*(y  +  i) 

9.2.5.  Add  the  following  features  to  the  expressions  of  Example  9.7. 

(a)  Unary  minus. 

(b)  Exponentiation  using  the  operator  **.  Ambiguities  are  resolved  in  the  reverse  manner  from  the 
other  arithmetic  operations:  A  **  B  **  C  is  the  same  as  A  **  (B  **  C). 

(c)  Replacement  using  the  operator  :—.  Only  one  use  of  the  operator  is  allowed. 

(d)  Multiple  replacement  as  in 

A       4  +  B  :=  C       5  +  2  *  3, 

which  means  that  C  is  set  equal  to  5  +  2  *  3,  B  is  set  to  equal  to  C  and  A  is  set  equal  to 
4  +  S. 

9.2.6.  Let  G  be  the  grammar  with  the  start  state  s  and  the  productions 

(i)  s  — »  xt  and  s  — »■  yi; 

(ii)  t^R  where  R  is  either  empty  or  one  of  +x4,  +yt  or  — xt 

(a)  Describe  L{G). 

(b)  Follow  the  steps  in  the  construction  of  the  corresponding  nondeterministic  finite  automaton; 
that  is,  describe  G',  G"  and  Af  that  were  constructed  in  the  proof. 

(c)  Continuing  the  previous  part,  construct  a  deterministic  machine  as  done  in  Section  6.6  corre- 
sponding to  jV. 

(d)  Can  you  construct  a  simpler  deterministic  machine  to  recognize  L{G)? 
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We'll  begin  with  a  review  of  material  discussed  in  Examples  7.9  (p.  206)  and  7.10.  Roughly  speaking, 
an  unlabeled  RP-trcc  is  an  RP-troc  with  the  vertex  labels  erased.  Thus,  the  order  of  the  sons  of  a 
vertex  is  still  important.  A  tree  is  "binary"  (resp.  "full  binary")  if  each  nonleaf  has  at  most  (resp. 
exactly)  two  sons.  Figure  9.5  shows  some  unlabeled  full  binary  RP-trees.  Here  is  a  more  precise 
pictorial  definition.  Compare  it  to  Definition  9.1  for  (labeled)  RP-trees. 

Definition  9.4  Unlabeled  binary  rooted  plane  trees  The  following  are  unlabeled 
binary  RP-trees.  Roots  are  indicated  by  •  and  other  vertices  by  o. 

(i)  The  single  vertex  •  is  such  a  tree. 

(ii)  If  Ti  is  one  such  tree,  so  is  the  tree  formed  by  (a )  drawing  Ti  root  upward,  (h )  adding  a  • 
above  Ti  and  connecting  •  to  the  root  ofTi,  and  (c)  changing  the  root  ofTi  to  o. 

(iii)  IfTi  and  T2  are  two  such  trees,  so  is  the  tree  formed  by  (a)  drawing  Ti  to  the  left  ofT2,  both 
root  upward,  (b)  adding  a  •  above  them  and  connecting  it  to  their  roots,  and  (c)  changing 
the  roots  ofT\  and  T2  to  o's. 

If  we  omit  (ii),  the  result  is  unlabeled  full  binary  RP-trees. 

These  trees  are  often  referred  to  as  "unlabeled  ordered  (full)  binary  trees."  Why?  To  define  a  binary 

tree,  one  needs  to  have  a  root.  Drawing  a  tree  in  the  plane  is  equivalent  to  ordering  the  children  of 
each  vertex.  Sometimes  the  adjective  "full"  is  omitted.  In  this  section,  we'll  study  unlabeled  ordered 
full  binary  trees. 

We  can  build  all  unlabeled  full  binary  RP-trees  recursively  by  applying  the  definition  over  and 
over.  To  begin  with  there  are  no  trees,  so  all  we  get  is  a  single  vertex  by  (1)  of  the  definition.  This 
tree  can  then  be  used  to  build  the  3  vertex  full  binary  RP-tree  shown  in  the  next  step  of  Figure  9.5. 
Using  the  two  trees  now  available,  we  can  build  the  three  new  trees  shown  in  the  right  hand  step  of 
Figure  9.5.  In  general,  if  we  have  a  total  of  t„  trees  at  step  n,  then  ti  =  1  (the  single  vertex)  and 
tn+i  =  1  +  (in)^  +  1  (either  use  the  single  vertex  tree  or  join  two  trees  Ti  and  T2  to  a  new  root). 

Example  9.9  Counting  and  listing  unlabeled  full  binary  RP-trees  How  many  unlabeled 
full  binary  RP-trees  are  there  with  n  leaves?  How  can  we  list  them? 

As  we  shall  see,  answers  to  these  questions  come  almost  immediately  from  the  recursive  defini- 
tion. It  is  important  to  note  that 

Definition  9.4  provides  exactly  one  way  to  produce  every  unlabeled  full  binary  RP-tree. 

If  there  were  more  than  one  way  to  produce  some  of  the  trees  from  the  definition,  we  would  not  be 
able  to  obtain  answers  to  our  questions  so  easily,  if  at  all. 

We  begin  with  counting.  Let  &„  be  the  desired  number.  Clearly  60  =  0,  since  a  tree  has  at  least 
one  leaf.  Let's  look  at  how  our  definition  leads  to  trees  with  n  leaves. 

According  to  the  definition,  an  unlabeled  full  binary  RP-tree  will  be  either  a  single  vertex, 
which  contributes  to  61,  or  it  will  have  exactly  two  principal  subtrees,  both  of  which  are  unlabeled 
full  binary  RP-trees.  If  the  first  of  these  has  k  leaves,  then  the  second  must  have  n  —  fc.  By  the  Rules 
of  Sum  and  Product, 


if  n  >  1. 


9.3 


fe=i 


Using  this  we  can  calculate  the  first  few  values  fairly  easily: 


61  =  1 


62  =  1 


63  =  2 


64  =  5 


65  =  14 


66  =  42 


br  =  132. 
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Figure  9.5  The  first  three  stages  in  building  unlabeled  full  binary  RP-trees  recursively.  A  o  is  a  vertex  of 
a  previously  constructed  tree  and  a  •  is  the  root  of  a  new  tree. 


Notice  how  the  recursion  came  almost  immediately  from  the  definition. 

So  far,  this  has  all  been  essentially  a  review  of  material  in  Examples  7.9  and  7.10.  Now  we'll 
look  at  something  new:  listing  the  trees  based  on  the  recursive  description.  Here's  some  pseudocode 
to  list  all  binary  RP-trees  with  n  leaves. 

/*  Make  a  list  of  n-leaf  unlabeled  full  binary  RP-trees  */ 
Procedure  BRPT(n) 

If  (n  =  1) ,  then  Return  the  single  vertex  tree 

Set  List  empty 

For  fc  =  1,  2,  •  •  •  ,n  -  1: 

/*  Get  a  list  of  first  principal  subtrees.  */ 
Sl  =  BRPT(A;) 

/*  Get  a  list  of  second  principal  subtrees.  */ 

Sr  =  BRPT(n  -  k) 
For  each  Ti  d  Sl: 

For  each  T2  €  Sr: 

Add  J0IN(Ti,T2)  to  List 
End  for 
End  for 
End  for 
Return  List 

End 

The  procedure  J0IN(Ti,T2)  creates  a  full  binary  RP-tree  with  principal  subtrees  Ti  and  T2.  The 

outer  for  loop  is  running  through  the  terms  in  the  summation  in  (9.3).  The  inner  for  loop  is 
constructing  each  of  the  trees  that  contribute  to  the  product  hkbn-k-  This  parallel  between  the 
code  and  (9.3)  is  perfectly  natural:  They  both  came  from  the  same  recursive  description  of  how  to 
construct  unlabeled  full  binary  RP-trees.  Q 

What  happened  in  this  example  is  typical.  Given  a  recursive  description  of  how  to  uniquely 
construct  all  objects  in  some  set,  we  can  provide  both  a  recursion  for  the  number  of  them  and 
pseudocode  to  list  all  of  them.  It  is  not  so  obvious  that  such  a  description  usually  leads  to  ranking 
and  unranking  algorithms  as  well.  Rather  than  attempt  a  theoretical  explanation  of  how  to  do  this, 
we'll  look  at  examples.  Since  it's  probably  not  fresh  in  your  mind,  it  would  be  a  good  idea  to  review 
the  concepts  of  ranking  and  unranking  from  Section  3.2  (p.  75)  at  this  time. 
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B(l):     :  B(n): 


J(B(l)xB(n-l))      J(B(2)xB(n-2))   ■■•   J(B(ri-l) xB(l)) 

Figure  9.6  The  local  description  for  n-leaf  unlabeled  full  binary  RP-trees.  We  assume  that  n  >  1  in  the 
right  hand  figure.  B  stands  for  BRPT  and  J(C,D)  =  {JOIN(c,d)  |  c  G  C,  d  €  D},  with  the  set  made  into 
an  ordered  list  using  lexicographic  order  based  on  the  orderings  in  C  and  D  obtained  by  lex  ordering  all  the 
pairs  (RANK(c),RANK(d)),  with  c  e  C  and  d  e  D. 


Example  9.10  A  ranking  for  permutations  We'll  start  with  permutations  since  they're  fairly 
simple. 

Suppose  that  S  is  an  n-set  with  elements  si  <  •  •  •  <  s„.  The  local  description  of  how  to  generate 
L{S),  the  permutations  of  S  listed  in  lex  order,  is  given  in  Figure  7.2  (p.  213).  This  can  be  converted 
to  a  verbal  recursive  description:  Go  through  the  elements  Si  £  S  in  order.  For  each  Si,  list  all  the 
permutations  of  5—  {sj}  in  lex  order,  each  preceded  by  "s,," .  (The  comma  after  Sj  is  not  a  misprint.) 

How  does  this  lead  to  RANK  and  UNRANK  functions?  Let  cr-.n^  n  be  a  pcirmutation  and 
let  RANK(So.(i), . . . ,  s^^^))  denote  the  rank  of  s^^^i^, . . . ,  So-(n)-  Since  the  description  is  recursive,  the 
rank  formula  will  be  as  well.  We  need  to  start  with  n  =  1.  Since  there  is  only  one  permutation  of  a 
one-element  set,  RANK(cr)  =  0. 

Now  suppose  n  >  1.  There  are  cr(l)  —  1  principal  subtrees  of  L{S)  to  the  left  of  the  subtree 

each  of  which  has  (n  —  1)!  leaves.  Thus  we  have 

RANK(.s,(i), . . . ,  s,(„))  =  (a(l)  -  l)(n  -  1)!  +  RANK(s,(2), . . . ,  s,(„)). 

You  should  also  bo  able  to  sec  this  by  looking  at  Figure  xrcfLexOrderLocal. 

As  usual  the  rank  formula  can  be  "reversed"  to  do  the  unranking:  Let  UNRANK(r,  S)  denote 
the  permutation  of  the  set  S  that  has  rank  r.  Let  q  =  r/{n  —  1)!  with  the  remainder  discarded.  Then 

UNRANK(r,5)  =  s^+i,  UNRANK (r  -  (n  -  l)!g,  5  -  {sq+i}).  □ 

Example  9.11  A  ranking  for  unlabeled  full  binary  RP-trees  How  can  we  rank  and  unrank 
BRPT(n),  the  n-leaf  unlabeled  full  binary  RP-trees? 

Either  Definition  9.1  or  the  listing  algorithm  BRPT  that  we  obtained  from  it  in  Example  9.9  can 
be  used  as  a  starting  point.  Each  gives  a  local  description  of  a  decision  tree  for  generating  n-leaf 
unlabeled  full  binary  RP-trees.  The  only  thing  that  is  missing  is  an  ordering  of  the  sons  in  the 
decision  tree,  which  can  be  done  in  any  convenient  manner.  The  listing  algorithm  provides  a  specific 
order  for  listing  the  trees,  so  we'll  use  it.  It's  something  like  lex  order: 

•  first  by  size  of  the  left  tree  (the  outer  loop  on  k), 

•  then  by  the  rank  of  the  left  tree  (the  middle  loop  on  Ti  e  Sl),  9.4 

•  finally  by  the  rank  of  the  right  tree  (the  inner  loop  on  T2  G  5^). 

The  left  part  of  the  figure  is  not  a  misprint:  the  top  •  is  the  decision  tree  and  the  bottom  •  is  it's 
label:  the  1-leaf  tree.  Carrying  this  a  bit  further,  Figure  9.7  expands  to  local  description  for  2  and 
3  leaves. 

Expanding  this  local  description  for  any  value  of  n  would  give  the  complete  decision  tree  in 

a  nonrecursive  manner.  However,  we  have  learned  that  expanding  recursive  descriptions  is  usually 
unnecessary  and  often  confusing.  In  fact,  we  can  obtain  ranking  and  unranking  algorithms  directly 
from  the  local  description,  as  we  did  for  permutations  in  the  preceding  example. 
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B(2): 


A 


B(l)xB(l) 


B(3): 
B(l)  X  B(2) 


B(2)  X  B(l) 


B(l)  X  (B(l)  X  B(l))  i         i  (B(l)  X  B(l))  X  B(l) 


Figure  9.7  An  expansion  of  Figure  9.6  for  n  =  2  and  n  —  3.  The  upper  trees  are  the  expansion  of 
Figure  9.6.  The  lower  trees  are  the  full  binary  RP-trees  that  occur  at  the  leaves. 


Let's  get  a  formula  for  the  rank.  Since  our  algorithm  for  hsting  is  recursive,  our  rank  formula 
will  also  be  recursive.  We  must  start  our  recursive  formula  with  the  smallest  case,  \T\  =  1.  In  this 
case  there  is  only  one  tree,  namely  a  single  vertex.  Thus  T  =  •  and  RANK(T)  =  0. 

Now  suppose  \T\  >  1.  and  let  Ti  and  T2  be  its  first  and  second  principal  subtrees.  (Or  left 
and  right,  if  you  prefer.)  We  need  to  know  which  trees  come  before  T  in  the  ranking.  Suppose  Q 

has  principal  subtrees  Qi  and  Q2  and  \Q\  =  \T\.  The  information  in  (9.4)  says  that  the  tree  T  is 
preceded  by  precisely  those  trees  Q  for  which  \Q\  =  \T\,  and  either 

•  IQil  <  iTil  OR 

•  |Qi|  =  |Ti|  AND  RANK(Qi)  <  RANK(Ti)  OR 

•  Qi  =  Ti  AND  RANK(Q2)  <  RANK(T2). 

The  number  of  trees  in  each  of  these  categories  is 

•  6fe6„_fe,  where  terms  were  collected  by  A;  =  IQi I, 

fe<|Ti| 

•  RANK(Ti)  X  h\T^\,  and 

•  1  X  RANK(T2). 

Hence 


Theorem  9.2  Rank  of  Unlabeled  Full  Binary  RP-Trees  The  rank  ot an  unlabeled  Ml 
binary  RP-tree  with  n  leaves  is  Oifn  =  l  and  otherwise  is 

RANK(T)  =    Yl  &fe&n-fe  +  RANK(ri)&|r,| +RANK(T2),  9.5 

fc<|Ti| 

where  T\  and  T2  are  the  first  and  second  principal  subtrees  of  T  and  bk  is  number  of  k-leaf 
unlabeled  full  binary  RP-trees. 
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RANK 


jCXj  =  ''i''4  +  &2&3  +  RANK  (^^^  62  +  RANK  (J^) 

RANK  ^ =  6162  +  RANK  (J^)  bi  +  RANK(«) 
RANK  (^)  =  RANK(«)6i  +  RANK(«) 
Figure  9.8   A  recursive  rank  calculation  for  an  unlabeled  full  binary  RP-tree  with  5  leaves. 


Figure  9.9    The  8-leaf  unlabeled  full  binary  RP-tree  of  rank  250. 


Figure  9.8  shows  how  (9.5)  is  used  to  compute  rank.  Each  of  the  equations  given  there  is  a 
special  case  of  (9.5).  The  first  equation  gives  the  rank  of  the  tree  we  are  interested  in.  The  other  two 
equations  give  ranks  that  are  needed  because  of  the  recursive  nature  of  (9.5).  One  can  now  work 
from  the  bottom  up  using  RANK(»)  =  0  to  get  ranks  of  0,  1  and  8,  respectively. 

As  always,  unranking  uses  a  greedy  algorithm.  Let  UNRANK(i?,  n)  denote  the  n-leaf  full  binary 
RP-tree  with  rank  R.  Let's  compute  UNRANK(250,  8),  the  8-leaf  full  binary  RP-tree  with  rank  250. 
Being  greedy,  we  want  Ti ,  the  left  principal  tree  to  have  as  many  leaves  as  possible.  We  have 

&167  +  &2&6  +  &3&5  +  M4  =  227 

and 

6167  +  &2&6  +  hbs  +  hibi  +  b^hs  =  227    28  >  250. 

Thus  the  first  principal  tree,  Ti .  has  five  leaves  and  the  second,  T2 ,  has  three.  Now  we  want  RANK (Ti ) 
to  be  as  large  as  possible.  Since  RANK(ri)63+RANK(T2)  =  250-227  =  23  and  23/63  =  23/2  =  11.5, 
Ti  has  rank  11  and  T2  has  rank  23-  II63  =  1.  Thus  Ti  =  UNRANK(11, 5)  and  T2  =  UNRANK(1, 3). 
We'll  compute  T2  first.  We  have  6162  =  1  and  6261  =  1,  so  the  first  principal  subtree  of  T2  has  two 
leaves  and  the  second  has  one.  Since  there  is  only  one  2-leaf  tree  and  only  one  1-leaf,  we  are  done 
with  T2.  Since  6164  -h  6263  -I-  6362  =  9,  the  first  principal  subtree  of  Ti  has  four  leaves  and  rank  2 
while  the  second  has  one  leaf.  Since  6163  =  2,  both  principal  subtrees  of  this  4-leaf  tree  have  two 
leaves.  Putting  this  all  together,  we  get  the  tree  shown  in  Figure  9.9.  Q 


Example  9.12  Computing  the  rank  without  recursion  In  Example  9.11  (p.  261)  we  proved 
the  recursive  formula  (9.5)  for  the  rank  of  an  unlabeled  full  binary  RP-tree.  This  formula  can  be 
implemented  as  it  stands  by  a  recursive  computer  program;  however,  recursive  procedures  can  be 
inconvenient  for  hand  calculations.  We  can  implement  the  formula  by  a  depth  first  postorder  vertex 
traversal  of  the  tree  that  we  want  to  rank. 

The  last  time  we  visit  a  vertex,  we  simply  record  the  rank  and  number  of  leaves  in  the  subtree 
which  has  that  vertex  as  its  root.  If  we  are  at  a  leaf,  the  rank  is  0  and  the  number  of  leaves  is  1. 
Suppose  we  have  reached  some  tree  T  that  is  not  a  leaf.  In  the  notation  of  (9.5)  we  have  available 
RANK(Ti),  |Ti|,  RANK(T2)  and  \T2\.  From  this  we  can  compute  RANK(T)  using  (9.5)  because 
n  =  jTil  +  \T2\.  The  values  of  the  6i's  can  be  computed  ahead  of  time  and  written  in  a  table. 
Figure  9.10  applies  this  idea  to  the  tree  in  Figure  9.9.  Rather  than  work  in  depth  first  postorder,  we 
can  simply  do  the  vertices  depth  by  depth,  starting  at  the  lowest  depth.  Q 
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8:250 


1:0    1:0    1:0  1:0 

Figure  9.10  Computing  the  rank  of  the  tree  in  Figure  9.9.  As  we  move  up  in  the  tree,  we  replace  a  vertex  v 
with  L:  R  where  R  is  the  rank  of  the  subtree  rooted  at  v  and  L  is  how  many  leaves  it  has.  When  information 
is  no  longer  needed,  we  discard  it  to  keep  the  figures  from  getting  cluttered. 


Example  9.13  Calculating  statistics  for  RP-trees  When  RP-trees  are  used  as  data  struc- 
tures, items  of  data  may  be  stored  at  the  leaves  and  an  "action"  such  as  finding  an  item  in  a  list 
may  involve  finding  the  appropriate  leaf  by  traversing  the  path  from  the  root  to  the  leaf.  How  fast 
can  data  in  such  a  tree  be  accessed? 

The  tree  is  being  used  as  a  decision  tree  and  the  number  of  decisions  needed  to  reach  the  leaf 
equals  the  length  of  the  path.  Thus,  the  time  needed  to  find  an  item  this  way  is  usually  nearly 
proportional  to  the  length  of  the  path  traversed.  Given  an  RP-tree  T,  the  length  of  the  longest  path 
from  the  root  to  a  leaf,  m(T),  say,  and  the  average  over  all  leaves  of  the  lengths  of  the  paths,  /i(T), 
say,  are  therefore  important  measures  of  how  good  the  tree  is  for  storing  data.  Worst  case  (i.e., 
longest)  time  is  proportional  to  m(T)  and  average  time  to 

Given  a  particular  tree  T,  we  could  calculate  m(T)  and  /x(T).  Suppose  we  are  told  that  the 
algorithm  for  creating  the  data  structure  constructs  a  random  tree  from  some  class;  e.g.,  the  set  of 
n-leaf  unlabeled  full  binary  RP-trees.  How  can  we  get  information  about  /^(T)  in  this  case?  Here  are 
some  possible  approaches  assuming  we  are  dealing  with  unlabeled  full  binary  RP-trees. 

•  Average  over  all:  We  might  try  to  compute  the  average  value  of  /x(T)  over  all  such  trees,  provided 
we  have  a  way  to  list  all  of  them.  Call  the  average  value  ^{n).  We  could  then  compute  n{T)  for 
each  tree  T  and  average  the  results  to  obtain  If  wc  arc  lucky  enough  to  have  an  unranking 
algorithm,  we  can  list  all  the  trees  using  UNRANK(z,  n)  for  i  =  0, 1, . . . ,  6„  —  1.  Unfortunately,  6„ 
is  much  too  large  for  realistic  values  of  n,  so  the  program  would  take  too  long  to  run. 

•  Generate  some  at  random  and  average:  Another  method  is  to  generate  n-leaf  unlabeled  full  binary 
RP-trees  at  random  by  the  method  mentioned  at  the  start  of  Section  3.2:  Choose  a  random 

integer  in  [0,6„),  unrank  it  and  study  the  result.  Repeat  this  procedure  many  times  to  get  a 
good  estimate.  Choosing  many  elements  from  a  set  at  random  and  studying  them  is  known  as 
the  Monte  Carlo  method  or  Monte  Carlo  simulation.  Studying  it  would  take  us  too  far  afield. 

•  Use  generating  functions:  We  might  try  to  find  a  theoretical  tool.  Indeed,  we'll  be  able  to  compute 

using  generating  functions  (Exercise  11.2.16  (p.  329)).  Theoretical  methods  can  be  wonder- 
ful when  they  work,  but  they  have  a  nasty  habit  of  not  working  when  we  change  the  problem. 
For  example,  our  method  will  not  give  us  the  average  of  m(T),  the  length  of  the  longest  path 
to  the  root.  It  can  be  estimated  theoretically,  but  it  is  far  more  difficult  than  determining  the 
average  of  /x(T). 

Suppose  wc  arc  looking  at  structures  where  wc  don't  have  an  unranking  algorithm  and  wc  can't 
afford  to  list  all  of  them.  It  appears  that  we  must  use  generating  functions.  This  is  not  necessarily  the 
case.  Suppose  that  we  have  an  unranking  algorithm  for  a  set  that  contains  the  one  we  are  interested 
in  and  is  not  too  much  larger.  Wc  can  use  that  algorithm,  rejecting  completely  those  structures  that 
lie  outside  the  set  of  interest.  Here  is  a  general  pseudocode  procedure  for  this  method. 
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Procedure  ESTIMATE  (setsize ,  ncases , parameters) 
Initialize 

/*  Loop  to  generate  ncases  examples.  */ 
For  i  =  1,2, ... ,  ncases : 

/*  Need  a  case  in  set  of  interest.  */ 
Set  needcase  to  true 
While  needcase  is  true: 

Choose  a  random  integer  j  €  [0,  setsize) 
T  =  mKhmCj,  parameters) 
If  T  is  okay,  then  set  needcase  to  false 
End  while 

Store  information  about  T  as  desired 
End  for 

Finish  it  up:  calculate  and  output  concluding  information 

End 

We  will  not  study  such  algorithms  in  this  text.  Q 

Exercises 


9.3.1.  Using  Definition  9.1,  recursively  construct  all  unlabeled  RP-trees  with  at  most  4  vertices.  Note  that 
you  are  not  supposed  to  simply  list  them.  You  should  iterate  the  definition  and  retain  all  trees  of  at 
most  4  vertices  that  arise.  When  the  list  does  not  change  during  an  iteration,  it  is  complete. 
Note:  You  are  asked  for  all  trees,  not  just  the  binary  ones. 

9.3.2.  Using  Definition  9.4,  recursively  construct  all  unlabeled  full  binary  RP-trees  with  at  most  4  leaves. 

9.3.3.  Construct  a  table  of  bn  for  n  <  10. 

9.3.4.  Compute  the  ranks  of  the  unlabeled  full  binary  RP-trees  shown  here. 


9.3.5.    Construct  the  unlabeled  full  binary  RP-trees  with  eight  leaves  whose  ranks  are  100,  200,  300  and 


9.3.6.  Prove  that  a  full  binary  RP-tree  with  n  leaves  has  n  —  1  other  vertices. 

9.3.7.  We  are  interested  in  the  unlabeled  full  binary  RP-tree  with  n  leaves  and  rank  bn/2;  i.e.,  the  tree  just 
past  the  middle  of  the  list.  Call  the  tree  Mn- 

(a)  Construct  M3,  M5  and  Mr- 

(b)  Conjecture  and  prove  the  nature  of  Ain  when  n  is  odd. 

(c)  Conjecture  and  prove  the  nature  of  Ain  when  n  is  even. 

9.3.8.  Provide  a  recursive  method  for  calculating  the  rank  of  a  decreasing  function  in  lex  order. 

9.3.9.  Use  equivalence  relations  to  provide  a  formal  definition  for  unlabeled  RP-trees  in  terms  of  labeled 
RP-trees. 

9.3.10.  Provide  a  recursive  method  for  calculating  the  lex  order  rank  of  a  permutation. 


400. 
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9.3.11.  Let  **  stand  for  the  binary  operation  of  exponentiation.  How  parentheses  arc  placed  in  the  expression 
a;**2/**a  effects  the  answer.  Thus  3**(2**3)  =  3^^  =  3^  =  1458  while  (3**2)**3  =  (3^)^  =  3*^  =  729. 
We  would  like  to  generate  all  ways  of  parenthesizing  xi**-  ■  ■**Xn-  This  can  be  done  by  first  selecting 
the  last  **  operation  to  be  performed  as  in 

(xi  **•■■**  Xk)  **  (a^fe+l  **•••**  Xn), 

and  then  proceeding  recursively  on  a;i  **  •  •  •  **xi~  and  x^+i  ** . . .  **Xn-  (If  fe  =  1  we  have  simply  {xi) 
on  the  left.)  The  recursion  stops  when  every  innermost  pair  of  parentheses  contains  just  one  number 

as  in  (xi).  Call  this  final  result  a  "parenthesizing." 

(a)  Show  that  if  you  remove  the  Xi's  from  a  parenthesizing,  it  is  possible  to  tell  where  they  belong. 
Thus  all  we  need  are  the  parentheses. 

(b)  Show  that  the  set  of  possible  parentheses  patterns  leads  to  a  tree  that  looks  the  same  as  that  in 
Figure  9.6  with  •  replaced  by  (  )  and  JOIN(A, _B)  interpreted  as  (AB). 

(c)  It  follows  from  (b)  that  there  is  a  simple  correspondence  between  the  parenthesized  expressions 
and  unlabeled  full  binary  RP-trees.  Describe  it. 

9.3.12.  If  Ti  arc  unlabeled  RP-trees,  lot  [Ti, .  .  .  ,Tj,]  denote  the  unlabeled  rooted  RP-tree  in  which  the  ith 
edge  from  the  root  leads  to  the  root  of  Tj.  In  particular,  [  ]  is  the  tree  •.  We  define  a  map  /  from  the 
unlabeled  RP-trees  to  the  unlabeled  full  binary  RP-trees  recursively  as  follows.  Let  /([])=[]=• 
and 

/([Ti,...,rfe])   =   [/(Ti),  f{[T2,...,Tk]) 

when     >  0.  Pictorially, 


9.1 


(a)  Show  that  /  is  a  bijection  between  n-vertex  unlabeled  RP-trees  and  n-leaf  unlabeled  full  binary 
RP-trees. 

(b)  Use  the  above  correspondence  to  find  a  procedure  for  ranking  and  unranking  the  set  of  all 
n-vertex  unlabeled  RP-trees.  Provide  a  local  description  like  Figure  9.6. 

*9.3.13.  In  this  exercise  you  will  obtain  a  formula  for  bn  by  proving  a  simple  recursion.  You  might  ask  "How 
would  I  be  expected  to  discover  such  a  result?"  Our  answer  at  this  time  would  be  "Luck  and/or 
experience."  When  wc  study  generating  functions,  you'll  have  a  more  systematic  method. 

Let  Ln  be  the  set  of  n-leaf  unlabeled  full  binary  RP-trees  with  one  leaf  marked  and  let  Vn  with 
the  set  with  one  vertex  marked. 

(a)  Prove  that  Cn  has  nbn  elements  and  that  Vn  has  (2n  —  l)bn  elements. 

(b)  Consider  the  following  operation  on  each  element  of  £«+!•  If  x  is  the  marked  leaf,  let  /  be  its 
father  and  b  its  brother.  Remove  x  and  shrink  the  edge  between  /  and  b  so  that  /  and  b  merge 
into  a  single  vertex.  Prove  that  each  element  of  Vn  arises  exactly  twice  if  we  do  this  with  each 
element  of  £n+l- 

(c)  From  the  previous  results,  conclude  that  (n  -|-  l)6„_|_i  =  2(2n  —  l)bn- 

(d)  Use  the  recursion  just  derived  to  obtain  a  fairly  simple  formula  for  bn- 


Notes  and  References 


Books  on  combinatorial  algorithms  and  data  structures  usually  discuss  trees.  You  may  wish  to  look 
at  the  references  at  the  end  of  Chapter  6.  Grammars  are  discussed  extensively  in  books  on  compiler 
design  such  as  [1]. 

1.  Alfred  V.  Aho  and  Jeffrey  D.  UUman,  Principles  of  Compiler  Design,  Addison- Wesley  (1977). 


PART  IV 

Generating  Functions 


Suppose  we  are  given  a  sequence  oq,  ai, . . .  .  The  "ordinary  generating  function"  associated  with  the 
sequence  ao,ai,...  is  the  function  A{x)  whose  value  at  x  is  the  power  series  J2i>o(^i^^-  other 
words,  "ordinary  generating  function  of"  is  a  map  (function)  from  sequences  to  power  series  that 
"packages"  the  entire  series  of  numbers  ao,ai, . . .  into  a  single  function  A(x).  Generating  functions 
are  not  limited  to  sequences  with  single  indices;  for  example,  we  will  see  that 


For  simplicity,  this  introductory  discussion  is  phrased  in  terms  of  singly  indexed  sequences. 

It  is  often  easier  to  manipulate  generating  functions  than  it  is  to  manipulate  sequences  ao,  ai, . . .. 
We  can  obtain  information  about  a  sequence  through  the  manipulation  of  its  generating  function. 
We  have  seen  a  little  of  this  in  Example  1.14  (p.  19)  and  in  Section  1.5  (p.  36).  In  the  next  two 
chapters,  we  will  study  generating  functions  in  more  detail. 

What  sort  of  information  is  needed  about  a  sequence  ao,  ai, . . .,  to  obtain  A{x),  the  generating 
function  for  the  sequence?  Of  course,  an  explicit  formula  for  a„  as  a  function  of  n  would  provide 
this  but  there  are  often  better  methods: 

•  Recursions:  A  recursion  for  the  a„'s  may  give  an  equation  that  can  be  solved  for  A{x).  We'll 
study  this  in  Sections  10.2  and  11.1. 

•  Constructions:  A  method  for  constructing  the  objects  counted  by  the  a„'s  may  lead  to  an 
equation  for  A{x).  Sometimes  this  can  be  done  using  an  extension  of  the  Rules  of  Sum  and 
Product.  See  Sections  10.4  and  11.2. 

Given  a  generating  function  A{x),  what  sort  of  information  might  we  obtain  about  the  sequence 
ao,ai,. . .  associated  with  it? 

•  An  explicit  formula:    Taylor's  Theorem  from  calculus  is  an  important  tool.  It  tells  us  that 

Mx)  =  E(-4^"H0)/n!).x",  and  so  gives  us  a  formula  if  A{x)  is  simple  enough.  In  particular, 
we  easily  obtain  a  rather  simple  formula  for  bn,  the  number  of  unlabeled  binary  trees  with  n 
leaves.  In  Section  9.3  we  obtained  only  a  recursion. 

•  A  recursion:  Simply  equate  coefficients  in  an  equation  that  involves  A{x).  This  is  done  in  various 

places  in  the  following  chapters,  but  especially  in  Section  10.3. 

•  Statistical  information:  For  example,  the  expected  number  of  cycles  in  a  random  permutation  of 
n  is  approximately  Inn.  The  degree  of  the  root  of  a  random  labeled  RP-tree  is  approximately 
one  more  than  a  "Poisson  random  variable,"  a  subject  beyond  the  scope  of  this  text. 

•  Asymptotic  information:  In  other  words,  how  does  a„  behave  when  n  is  large?  Methods  for  doing 
this  are  discussed  in  Section  11.4. 

•  Prove  identities:  We  saw  some  of  this  in  Section  1.5  (p.  36). 

We  hope  that  the  preceding  list  has  convinced  you  that  generating  functions  are  an  important 
tool  that  yield  results  that  are  sometimes  difficult  (or  perhaps  even  impossible)  to  obtain  by  other 
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methods.  If  you  find  generating  functions  difficult  to  understand,  keep  this  motivation  in  mind  as 
you  study  them. 

The  next  chapter  introduces  the  basic  concepts  associated  with  generating  functions: 

•  Basic  concepts:  We  define  a  generating  function  and  look  at  some  basic  manipulations. 

•  Recursions:  If  you  have  encountered  recursions  in  other  courses  or  do  not  wish  to  study  them, 
you  can  skim  Section  10.2.  However,  if  you  had  difficulty  in  Section  10.1,  you  should  use  this 
section  to  obtain  additional  practice  with  generating  functions. 

•  IVlanipulations:  In  Section  10.3  we  present  some  techniques  for  manipulating  generating  functions. 

•  Rule  of  Sum  and  Product:  We  consider  Section  10.4  to  be  the  heart  of  this  chapter.  In  it,  we 
extend  the  definition  of  generating  function  a  bit  and  obtain  Rules  of  Sum  and  Product  which  do 
for  generating  functions  what  the  rules  with  the  same  name  did  for  basic  counting  in  Chapter  1. 

In  Chapter  11  we  take  up  four  separate  topics: 

•  Systems  of  recursions:  This  is  a  continuation  of  the  discussion  in  Section  10.2. 

•  Exponential  generating  functions:  These  play  the  same  role  for  objects  with  labels  that  ordinary 
generating  functions  play  for  unlabeled  objects  in  Section  10.4. 

•  Counting  objects  with  symmetries:  We  apply  generating  functions  to  the  problems  discussed  in 
Section  4.3  (p.  111). 

•  Asymptotics:  We  discuss  methods  for  obtaining  asymptotics  from  generating  functions  and,  to 
a  lesser  extent,  from  recursions. 

The  sections  in  Chapter  11  can  be  read  independently  of  one  another;  however,  some  of  the  asymp- 
totic examples  make  use  of  results  (but  not  methods)  from  Section  11.2. 


CHAPTER  10 


Ordinary 
Generating  Functions 


Introduction 


We'll  begin  this  chapter  by  introducing  the  notion  of  ordinary  generating  functions  and  discussing 
the  basic  techniques  for  manipulating  them.  These  techniques  are  merely  restatements  and  simple 
applications  of  things  you  learned  in  algebra  and  calculus.  You  must  master  these  basic  ideas  before 
reading  further. 

In  Section  2,  we  apply  generating  functions  to  the  solution  of  simple  recursions.  This  requires  no 
new  concepts,  but  provides  practice  manipulating  generating  functions.  In  Section  3,  we  return  to 
the  manipulation  of  generating  functions,  introducing  slightly  more  advanced  methods  than  those 
in  Section  1.  If  you  found  the  material  in  Section  1  easy,  you  can  skim  Sections  2  and  3.  If  you  had 
some  difficulty  with  Section  1,  those  sections  will  give  you  additional  practice  developing  your  ability 
to  manipulate  generating  functions. 

Section  4  is  the  heart  of  this  chapter.  In  it  we  study  the  Rules  of  Sum  and  Product  for  ordinary 
generating  functions.  Suppose  that  we  are  given  a  combinatorial  description  of  the  construction  of 
some  structures  we  wish  to  count.  These  two  rules  often  allow  us  to  write  down  an  equation  for  the 
generating  function  directly  from  this  combinatorial  description.  Without  such  tools,  we  may  get 
bogged  down  in  lengthy  algebraic  manipulations. 


10.1   What  are  Generating  Functions? 


In  this  section,  we  introduce  the  idea  of  ordinary  generating  functions  and  look  at  some  ways  to 
manipulate  them.  This  material  is  essential  for  understanding  later  material  on  generating  functions. 
Be  sure  to  work  the  exercises  in  this  section  before  reading  later  sections! 


Definition  10.1  Ordinary  generating  function  (OGF)  Suppose  we  are  given  a  sequence 
ao,oi,...  .  The  ordinary  generating  function  (aJso  called  OGF)  associated  with  this  se- 
quence is  the  function  whose  value  at  x  is  Yl^o^i^^-  sequence  ao,ai,...  is  called  the 
coefEcients  of  the  generating  function. 
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People  often  drop  "ordinary"  and  call  this  the  generating  function  for  the  sequence.  This  is  also 
called  a  "power  series"  because  it  is  the  sum  of  a  series  whose  terms  involve  powers  of  x.  The 
summation  is  often  written  X^j>Q  ctja;'  or  ^  aja;'. 

If  your  sequence  is  finite,  you  can  still  construct  a  generating  function  by  taking  all  the  terms 
after  the  last  to  be  zero.  If  you  have  a  sequence  that  starts  at  Uk  with  fc  >  0,  you  can  define 
ao, . . .  ,afe-i  to  be  any  convenient  values.  "Convenient  values"  are  ones  that  make  equations  nicer 
in  some  sense.  For  example,  if  Hn+i  =  2_ff„  +  1  for  n  >  0  and  Hi  =  1.  It  is  convenient  to  let  Hq  =  0 
so  that  the  recursion  is  valid  for  n  >  0.  {H^  is  the  number  of  moves  required  for  the  Tower  of  Hanoi 
puzzle.  See  Exercise  7.3.9  (p.  218).)  On  the  other  hand,  if  6i  =  1  and  6„  =  Y^k=i  ^kbn-k  for  n  >  1, 
it's  convenient  to  define  6o  =  0  so  that  we  have  6„  =  X^fe=o  ^kbn-k  for  k  1.  (The  latter  sum  is  a 
"convolution" ,  which  we  will  define  in  a  little  while.) 

To  help  us  keep  track  of  which  generating  function  is  associated  with  which  sequence,  we  try 
to  use  lower  case  letters  for  sequences  and  the  corresponding  upper  case  letters  for  the  generating 
functions.  Thus  we  use  the  function  A  as  generating  function  for  a  sequence  of  a^'s  and  B  as  the 
generating  function  for  6„'s.  Sometimes  conventional  notation  for  certain  sequences  make  this  upper 
and  lower  case  pairing  impossible.  In  those  cases,  we  improvise. 

You  may  have  noticed  that  our  definition  is  incomplete  because  we  spoke  of  a  function  but  did  not 
specify  its  domain  or  range.  The  domain  will  depend  on  where  the  power  series  converges;  however, 
for  combinatorial  applications,  there  is  usually  no  need  to  be  concerned  with  the  convergence  of 
the  power  series.  As  a  result  of  this,  we  will  often  ignore  the  issue  of  convergence.  In  fact,  we  can 
treat  the  power  series  like  a  polynomial  with  an  infinite  number  of  terms.  The  domain  in  which  the 
power  series  converges  does  matter  when  we  study  asymptotics,  but  that  is  still  several  sections  in 
the  future. 

If  we  have  a  doubly  indexed  sequence  bij ,  we  can  extend  the  definition  of  a  generating  function: 

oo 

B{x,y)  =  YjY^^i'j^'y^  =  ^bi^jx'yK 

j>0  i>0  i,j=0 

Clearly,  we  can  extend  this  idea  to  any  number  of  indices — we're  not  limited  to  just  one  or  two. 

Definition  10.2  [a;"]  Given  a  generating  function  A{x)  we  use  [x"]  A(x)  to  denote  an,  the 
coefficient  of  x^.  For  a  generating  function  in  more  variables,  the  coefEcient  may  be  another 
generating  function.  For  example  [x'^y'^]  B{x,  y)  =  bn,k  and  [a;"]  B{x,  y)  =  J2i>o  bn,iy^- 

Implicit  in  the  preceding  definition  is  the  fact  that  the  generating  function  uniquely  determines  its 
coefficients.  In  other  words,  given  a  generating  function  there  is  just  one  sequence  that  gives  rise  to 
it.  Without  this  uniqueness,  generating  functions  would  be  of  little  use  since  we  wouldn't  be  able  to 
recover  the  coefficients  from  the  function  alone. 

This  leads  to  another  question.  Given  a  generating  function,  say  A{x),  how  can  we  find  its  coef- 
ficients ao,  oi, . . .?  One  possibility  is  that  we  might  know  the  sequence  already  and  simply  recognize 
its  generating  function.  Another  is  Taylor's  Theorem.  We'll  phrase  it  slightly  differently  here  to  avoid 
questions  of  convergence.  In  our  form,  it  is  practically  a  tautology. 

Theorenn  10.1  Taylor's  Theorem  If  A{x)  is  the  generating  function  for  a  sequence 
aojfli, . . .,  then  a„  =  v4("'(0)/n!,  where  A^"'  is  the  nth  derivative  of  A  and  0!  =  1.  (The  theorem 
extends  to  more  than  one  variable,  but  we  will  not  state  it.) 

We  stated  this  to  avoid  questions  of  convergence — but  don't  we  have  to  worry  about  convergence  of 
infinite  series?  Yes  and  no: 

When  manipulating  generating  functions  we  normally  do  not  need  to  worry  about  convergence  unless 
we  are  doing  asymptotics  (see  Section  11.4)  or  substituting  numbers  for  the  variables  (see  the  next 
example). 
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Example  10.1  Binomial  coefficients  Let's  use  the  binomial  coefficients  to  get  some  prac- 
tice. Set  ak,n  —  (^)  •  Remember  that  ak,n  =  0  for  k  >  n.  From  the  Binomial  Theorem, 
(1  +  a;)"  =  ELo  (f)^''-  Thus  E  ak,nx''  =  (1  +  a;)"  and  so 

OO 

A{x,y)  =  ^^afc,„xV  =  ^(1  +  ^)"^/"  =  E((l+^)y)"- 

ri>Ofe>0  ri>0  n=0 

From  the  formula  X)j,>o  az'^  —  a/{l  —  z)  for  summing  a  geometric  series^  we  have 

A{x,y)  =         I  =  i  .  10.1 

1  -  (1  +  l-y-xy 

Let's  see  what  we  can  get  from  this. 
•  From  our  definitions,  [a;*'y"]  A{x,  y)  =  (^)  and  [?/"]  A{x,  y)  =  (1  +  a;)",  which  is  equivalent  to 


fe=o 


Of  course,  this  is  nothing  new  it's  what  wc  started  out  with  when  we  worked  out  the  formula 
for  A{x,  y).  Wc  just  did  this  to  become  more  familiar  with  the  notation  and  manipulation. 

•  Now  let's  look  at  \x^']  A{x,y).  From  (10.1)  and  the  formula  for  a  geometric  scries, 

1  1/(1-2/) 


A{x,y) 


{l-y)-xy       1  -  xy/{l  -  y) 

k 


k 


Thus  [x'^j  A(n,  k)  =        (i^)  '      other  words,  we  have  the  generating  function 

n>0  ^  ^  V  y/ 

This  is  new  and  we'll  get  more  in  a  minute. 

•  We  can  replace  the  x  and  y  in  our  generating  functions  by  numbers.  If  we  do  that  in  (10.2)  it's 
not  very  interesting.  Let's  do  it  in  (10.3).  Wc  must  be  careful:  The  sum  on  the  left  side  is  infinite 
and  so  convergence  is  an  issue.  With  y  =  1/3  we  have 


n>0 


and  it  can  be  shown  that  the  sum  converges.  So  this  is  a  new  result.  On  the  other  hand,  if  we 
set  ?/  =  2  instead  the  series  would  have  been  J2  (k)  2"  which  diverges  to  infinity.  The  right  side 
of  (10.3)  is  not  infinity  but  (— 1)'^+^2'^,  which  is  nonsensical  for  a  sum  of  positive  terms.  That's 

a  warning  that  something  is  amiss,  namely  a  lack  of  convergence. 

•  Returning  to  (10.1),  let's  set  x  =  y.  In  that  case,  we  obtain 


n.k>0 


What  is  the  coefficient  of  a;™  on  the  left  side?  You  should  be  able  to  see  that  it  will  be  the  sum 
of  (^)  over  all  n  and  k  such  that  n  +  k  =  m.  Thus  n  =  m  —  k  and  so 


E 

fe>0 


m-k\         r  ^,  /  1 
=  \x 


k    I  Vl  —  a;  —  a; 
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In  the  next  section,  we  will  see  how  to  obtain  such  cocfScicnts,  which  turn  out  to  be  the  Fibonacci 
numbers.  Convergence  is  not  an  issue:  the  sum  on  the  left  is  finite  since  the  binomial  coefficients 
are  nonzero  only  when  m  —  k  >  k,  that  is  k  <  m/2.  Q 

There  are  two  important  differences  in  the  study  of  generating  functions  here  and  in  calculus. 
We've  already  noted  one:  convergence  is  usually  not  an  issue  as  long  as  we  know  the  coefficients  make 
sense.  The  second  is  that  our  interest  is  in  the  reverse  direction:  We  study  generating  functions  to 
learn  about  their  coefficients  but  in  calculus  one  studies  the  coefficients  to  learn  about  the  functions. 
For  example,  one  might  use  the  first  few  terms  of  the  sum  to  estimate  the  value  of  the  function. 

The  following  simple  theorem  is  important  in  combinatorial  uses  of  generating  functions.  Some 
applications  can  be  found  in  the  exercises.  It  plays  a  crucial  role  in  the  Rule  of  Product  in  Section  10.4. 
Later,  we  will  extend  the  theorem  to  generating  functions  with  more  than  one  variable. 

Theorem  10.2    Convolution  Formula    Let  A{x),  B{x),  and  C{x)  be  generating  functions. 

Then  C{x)  =  A{x)B{x)  if  and  only  if 

n 

Cn  =       akbn-k  for  all  n>0.  10.6 

fc=0 

TJie  sum  can  also  be  written  X]j.>q  ttn-kbk  3,nd  also  as  the  sum  of  aibj  over  all  i,  j  such  that 
i  +  j  =  n.  We  call  (10.6)  a  convolution. 

Proof:  You  should  have  no  difficulty  verifying  that  the  two  other  forms  given  for  the  sum  are  in 
fact  the  same  as  ^  akbn-k- 

We  first  prove  that  C{x)  =  A{x)B{x)  gives  the  claimed  summation.  Since  we  are  not  concerning 
ourselves  with  convergence,  we  can  multiply  generating  functions  like  polynomials: 

A{x)B{x)  =   (y2akx''^(y2b.jxA   ^  akbjx''+^  =  ^(f^  akbn-k^  x"-, 

^k>0  '  ^j>0  '  k,j>0  n>0^fe=0  ^ 

where  the  last  equality  follows  by  letting  k+j  =  n;  that  is,  j  =  n—k.  The  sum  on  k  stops  at  n  because 
J  >  0  is  equivalent  to  n  —  >  0,  which  is  equivalent  to  k  <  n.  This  proves  that  C{x)  =  A{x)B{x) 
implies  (10.6). 

Now  suppose  we  are  given  (10.6).  Multiply  by  a;",  sum  over  n  >  0,  let  j  =  n  —  fc  and  reverse  the 
steps  in  the  previous  paragraph  to  obtain 

C{x)  =  ^c„a;"  =         akbjX^+^  =  A{x)B{x). 

n>0  k,j>Q 

We've  omitted  a  few  computational  details  that  you  should  fill  in.  Q 

Here  arc  a  few  generating  functions  that  are  useful  to  know  about.  The  first  you've  already 
encountered,  the  second  appears  in  Exercise  10.1.4,  the  third  is  an  application  of  the  convolution 
formula  (Exercise  10.1.6),  and  the  others  are  results  from  calculus. 

oo 

y{ar'')x''  =  10.7 
^       ^  1-rx 

k=0 

E  (:).'  =  (1  +       whe«  ( ^)  =  +  "  lor  all  10.8 


k=0 


E(E«M^"  =  Y3^E""^"' 

n=0  ^k=0      '  n>0 


10.9 
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OO      k  k 

=  e»^  10.10 

oo     k  k 

yOX       ^  10.11 


Exercises 


These  exercises  will  give  you  some  practice  manipulating  generating  functions. 

10.1.1.  Let  p  =  1  +  x  +  a;^  +       g  =  1  +  a;  +  a;^  +  a;^  +a;*,  and  r  =  j^. 

(a)  Find  the  coefficient  of      in  p^;  in  p^;  in  p^. 

(b)  Find  the  coefficient  of  a;^  in  q^;  in  q^;  in  q'^. 

(c)  Find  the  coefficient  of  x^  in  r^;  in  r^;  in  r^. 

(d)  Can  you  offer  a  simple  explanation  for  the  fact  that  p,  q  and  r  all  gave  the  same  answers? 

(e)  Repeat  (a)-(c),  this  time  finding  the  coefficient  of  x'^ .  Explain  why  some  are  equal  and  some  are 
not. 

10.1.2.  Find  the  coefficient  of      in  each  of  the  following. 

(a)  {2  +  X  +  x'^){l  +  2x  +  x'^){l  +  X  +  2x^) 

(b)  {2  +  x  +  x^){l  +  2x  +  x^f{l  +  x  +  2x^f 

(c)  x{l  +  x)^^{2-xf 

10.1.3.  Find  the  coeflicient  of  x^^  in  {x^  +      +      +      +  x^f. 

Hint.  If  you  are  clever,  you  can  do  this  without  a  lot  of  calculation. 

10.1.4.  This  exorcise  explores  the  general  binomial  theorem,  geometric  series  and  related  topics.    Part  (a) 
requires  calculus. 

(a)  Let  r  be  any  real  number.  Use  Taylor's  Theorem  without  worrying  about  convergence  to  prove 

('^  -  r(r-l)...(r-fc  +  l) 


(l-\-zY  =        (  /; )  '^^  where 


k  k\ 


If  you're  familiar  with  some  form  of  Taylor's  Theorem  with  remainder,  use  it  to  show  that,  for 
some  C  >  0,  the  infinite  sum  converges  when  \z\  <  C.  (The  largest  possible  value  is  C  =  1,  but 
you  may  find  it  easier  to  use  a  smaller  value.) 

(b)  Use  the  previous  result  to  obtain  the  geometric  series  formula: 

Efe>o  az^=a/{l-z). 

(c)  Show  that  X;Lo  ^^'^  =  (a-a«"+^)/(l-z). 

(d)  Find  a  simple  formula  for  the  coefficient  of  x"  in  (1  —  ax)^"^ . 

10.1.5.  In  this  exercise  we'll  explore  the  effect  of  derivatives.  Let  -A{x)  —  ^^rn=0 '^^'^^ '  ordinary 
generating  function  for  the  sequence  a.  In  each  case,  first  answer  the  question  for  =  1  and  k  =  2 
and  then  for  general  k. 

(a)  What  is  [x"]  (x^A{x)),  that  is,  the  coefficient  of  x"  in  -x^ A{x)l 

(b)  Show  that  [x"]  f-^)  A(x)  =  l!i±^lli^.  This  notation  means  compute  the  feth  derivative 

\dxj  nl 

of  ^(x)  and  then  find  the  coefficient  of  x"  in  the  generating  function.  It  can  also  be  written 


[x"]  ^e^^Cxj. 

(c)  Show  that  [x"]  ^(^)   =  n  an-  This  notation  means  that  you  repeat  alternately  the 

operations  of  differentiating  and  multiplying  by  x  a  total  of  k  times  each.  For  example,  when 
A;  =  2  we  have  x(x^'(x))'. 
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10.1.6.  Using  Theorem  10.2  or  otherwise,  do  the  following. 

(a)  Prove:  If  Cn  =  oq  +  oi  +  •  •  •  +  an,  then  C{x)  =  A{x)/{1  —  x). 

(b)  Simplify  [^)-[^)+...  +  (-if  [I)  when  n  >  0. 

(c)  Suppose  that  dn  is  the  sum  of  ajbjC^;  over  all  i,    A;  >  0  such  that  i  +  j  +  k  =  n.  Express  D{x) 
in  terms  of  A{x),  B{x),  and  C(x). 

10.1.7.  Suppose  that  \r\  <  1.  Obtain  a  formula  for  X]n>o  (fe)^"  ^  ^  function  of  k  and  r.  Show  that  the 
sum  converges  by  using  the  ratio  test  for  series. 

10.1.8.  Note  that  (1  +  a;)™+"  =  (1  +  a;)™(l  +  x)".  Note  that  the  coefficients  of  powers  of  x  in  (1  +  x-)'"+", 
(1  +  a;)™,  and  (1  +  a;)"  are  binomial  coefficients.  Use  Theorem  10.2  to  prove  Vandermonde's  formula: 

k 

m  +  n\       \  "  I  rn  \  I  n 


E 


k     I  \  i 

i=Q 

This  is  one  of  the  many  identities  that  are  known  for  binomial  coefficients. 

Hint.  Remember  that  n  and  A;  in  (10.6)  can  be  replaced  by  other  variables.  Look  at  the  index  and 
limits  on  the  summation. 

10.1.9.  Find  a  simple  expression  for  (jt™^))  where  the  sum  is  over  all  values  of  i  for  which  the 

binomial  coefficients  in  the  sum  are  defined. 

10.1.10.   The  results  given  here  are  referred  to  as  bisection  of  series.  Let  A{x)  =  Y]'^_q  Unx". 

(a)  Show  that  {A{x)  +  A{—x))/2  is  the  generating  function  for  the  sequence  bn  which  is  zero  for 
odd  n  and  equals  On  for  even  n. 

(b)  What  is  the  generating  function  for  the  sequence  On  which  is  zero  for  even  n  and  equals  On  for 

odd  n? 

(c)  Evaluate  X);j>o  i2k)^^''  where  x  is  a  real  number.  In  particular,  what  is  X]^>q  (2^)? 

*10.1.11.    Fix  k  >  1  and  0  <  j  <  fe.  If  you  are  familiar  with  fcth  roots  of  unity,  generalize  the  Exercise  10.1.10 
to  the  sequence  bn  which  is  On  when  n  +  j  is  a  multiple  of  k  and  is  zero  otherwise: 


fc-i 
k  ^ — ' 


s=0 

where  a;  =  exp(27ri/fc),  a  primitive  kth.  root  of  unity.  (The  result  is  called  multisection  of  series.) 

00    y  \ 

10.1.12.  Evaluate  Sfc  =  ^  P^j2"". 

n=0  ^  ^ 

*10.1.13.  Using  Exercise  10.1.11,  show  that 

a;=^"    _  2cos(xV3/2) 


IT  + 


(3n)!         3  3e^/2 

n=0 

and  develop  similar  formulas  for  X;P^"+V(3n  +  1)!  and  EP^"^^/(3»i  +  2)!. 
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*10.1.14.  We  use  the  terminology  from  the  Principle  of  Inclusion  and  Exclusion  (Theorem  4.1  (p.  95)).  Also, 
let  -Bfc  be  the  number  of  elements  of  S  that  lie  in  exactly  k  of  the  sets  Si,  5*2,  . . . ,  Sm- 

(a)  Using  the  Rules  of  Sum  and  Product  (not  Theorem  4.1),  prove  that 


k>0 


(b)  If  the  generating  functions  corresponding  to  Eo,Ei,...  and  No,Ni,...  are  E(x)  and  N{x), 
conclude  that  N{x)  =  E{x  +  1). 

(c)  Use  this  to  conclude  that  E{x)  =  N{x  —  1)  and  then  deduce  the  extension  of  the  Principle  of 
Inclusion  and  Exclusion: 


i>0 


10.2   Solving  a  Single  Recursion 


In  this  section  we'll  use  ordinary  generating  functions  to  solve  some  simple  recursions,  including  two 
that  we  were  unable  to  solve  previously:  the  Fibonacci  numbers  and  the  number  of  unlabeled  full 
binary  RP-trees. 

Example  10.2  Fibonacci  numbers  Let  Fn  be  the  number  of  n  long  sequences  of  zeroes  and 
ones  with  no  consecutive  ones.  We  can  easily  see  that  Fi  =  2  and  F2  =  3,  but  what  is  the  general 

formula? 

Suppose  that  ti, . . .  ,tn  is  an  arbitrary  sequence  of  desired  form.  We  want  to  see  what  happens 
when  we  remove  the  end  of  the  sequence,  so  we  assume  that  n  >  1.  If  t„  =  0,  then  h,. . .  ,tn-i  is 
also  an  arbitrary  sequence  of  the  desired  form.  Now  suppose  that  t„  =  1.  Then  =  0  and  so,  if 
n  >  2,  t\, . . .  ,tn-2  is  an  arbitrary  sequence  of  the  desired  form.  All  this  is  reversible:  Suppose  that 
n>  2.  The  following  two  operations  produce  all  n  long  sequences  of  the  desired  form  exactly  once. 

•  Let  ii, . . . ,  t„-i  be  an  arbitrary  sequence  of  the  desired  form.  Set  i„  =  0. 

•  Let  t„_2  be  an  arbitrary  sequence  of  the  desired  form.  Set  i„-i  =  0  and  t„  =  1. 

Since  all  n  long  sequences  of  the  desired  form  are  obtained  exactly  once  this  way,  the  Rule  of  Sum 
yields  the  recursion 

Fn  =  Fn-i+Fn-2    for    u  >  2.  10.12 

Here  are  the  first  few  values. 


n 

0 

1 

2 

3  4 

5 

6 

7 

8  9 

10 

1 

2 

3 

5  8 

13 

21 

34 

55  89 

144 

These  numbers,  called  the  Fibonacci  numbers,  were  studied  in  Exercise  1.4.10,  but  we  couldn't  solve 
the  recursion  there.  Now  we  will. 

First,  we  want  to  adjust  (10.12)  so  that  it  holds  for  all  n  >  0.  To  do  this  we  define  Fn  when  n 
is  small  and  introduce  a  new  sequence  c„  to  "correct"  the  recursion  for  small  n; 


Fn    =   Fn-1  +  Fn-2  +  C, 


10.13 
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where  Fq  =  1,  Fk  =  0  for  fc  <  0,  cq  =  ci  =  1,  and  c„  =  0  for  n  >  2.  This  recursion  is  now  vaUd  for 

n  >  0.  Let  F{x)  be  the  generating  function  for  Fq,  Fi,  In  the  foUowing  series  of  equations,  steps 

without  explanation  require  only  simple  algebra. 


F{x) 


n=0 

oo 

=    J2iPn-l  +  Fn-2  +  Cn)x" 
n=0 

oo 

n=0 

oo  oo  oo 

n=0  n=0 

oo  oo 

=  a;  ^  Fix''  +  a;^  ^  Fkx''  +  1  +  a; 


by  defintion 


by  (10.13) 


n=0 


by  definition 


„2  i 


In  summary,  F{x) 


=  xF{x)  +  x^F{x)  +  1  +  x. 
1  +  a;  +  (x  +  x^)F{x).  We  can  easily  solve  this  equation: 

1  +  a; 


F{x) 


■  x  —  X 


2  ■ 


10.14 


Now  what?  Wc  want  to  find  a  formula  for  the  coefficient  of  x"  in  F{x).  We  could  try  using 
Taylor's  Theorem.  Unfortunately,  F^"'  (a;)  appears  to  be  extremely  messy.  What  alternative  do  we 
have? 

Remember  partial  fractions  from  calculus?  If  not,  you  should  read  Appendix  D  (p.  387).  Using 
partial  fractions,  we  will  be  able  to  write  F{x)  =  A/(l  —  ax)  +  B/(l  —  bx)  for  some  constants  a,  6,  A 

and  B.  Since  the  formula  for  summing  geometric  series  is  1  +  ox  +  (ax)^  H  =  1/(1  —  ax),  we  will 

have  Fn  =  Aa^  +  _Bfe".  There  is  one  somewhat  sneaky  point  here.  Wc  want  to  factor  a  polynomial 
of  the  form  1  +cx  +  dx^  into  (1  —  ax)(l  —  hx).  To  do  this,  let  y  =  1/x  and  multiply  by  y^.  The  result 
\s  +  cy  +  d=  {y  —  a){y  —  b).  Thus  a  and  b  are  just  the  roots  oi  y"^  +  cy  +  d  =  0.  In  our  case  we 
have  j/^  —  y  —  1  =  0. 

Let's  carry  out  the  partial  fraction  approach.  We  have 


1  —  X  —  x^  =  (1— ax)(l  — 6x)       where    a,  b  = 


1±  V5 


(Work  it  out.)  For  definitivencss,  let  a  be  associated  with  the 
idea  of  the  numbers  we  are  working  with,  o  =  1.618  ••  •  and  b  = 
fractions,  you  should  be  able  to  derive 

1+x  1+a 


F{x)  = 


-  and  b  with  the  — .  To  get  some 
-.618  •  •  •  .  By  expanding  in  partial 

1  +  6 


1 


X  —  x^ 


\/5(l  -  ax)     V5{1  -  bx) ' 
Now  use  geometric  series  and  the  algebraic  observations  1  +  a  =     and  1  +  6  : 

„n+2  ^n+2 


6^  to  get 


Fr,. 


10.15 


\/5       V5  ■ 

It  is  not  obvious  that  this  expression  is  even  an  integer,  much  less  equal  to  F„ .  If  you're  not  convinced, 
you  might  like  to  calculate  a  few  values. 

Since  |6|  <  1,  |6"+VV5|  <  l/\/5  <  1/2.  Thus  we  have  the  further  observation  that  Fn  is  the 
integer  closest  to  a"+^/v^  —  (1.618  •  •  •)"'"'"^/2.236  •  ■  •.  For  example  j \fh  =  3.065  •  ■  •  which  is  close 
to  ^2  =  3  and  a^'^/ \/5  =  144.001  •  •  •,  which  is  quite  close  to  Fio  =  144.  Of  course,  the  approximations 
get  better  as  n  gets  larger  since  the  error  is  bounded  by  a  large  power  of  b  and  |6|  <  1.  Q 
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The  method  that  we  have  just  used  works  for  many  other  recursions,  so  it  is  uschil  to  lay  it  out 
as  a  series  of  steps.  Although  our  description  is  for  a  singly  indexed  recursion,  it  can  be  applied  to 
the  multiply  indexed  case  as  well. 

A  procedure  for  solving  recursions  Here  is  a  six  step  procedure  for  solving  recursions.  It 
is  not  guaranteed  to  work  because  it  may  not  be  possible  to  carry  out  all  of  the  steps.  Let  the 
sequence  be  a„. 

1.  Adjust  the  recursion  so  that  it  is  valid  for  all  n.  In  particular,  an  should  be  dehncd  for  all  n 
and  ttn  =  0  for  n  <  0.  You  may  need  to  introduce  a  "correcting"  sequence  Cn  as  in  (10.13). 

2.  Introduce  the  generating  function  A{x)  =  J2n>o  O'nX^ ■ 

3.  Substitute  the  recursion  into  the  summation  for  A{x). 

4.  Rearrange  the  result  so  that  you  can  recognize  other  occurrences  of  A{x)  and  so  get  rid  of 
summations.  (This  is  not  always  possible;  it  depends  on  what  the  recursion  is  like.) 

5.  If  possible,  solve  the  resulting  equation  to  obtain  an  explicit  formula  for  A(x). 

6.  By  partial  fractions,  Taylor's  Theorem  or  whatever,  obtain  an  expression  for  the  coefRcient 
of  x"'  in  this  explicit  formula  for  A(x).  This  is  an. 

You  should  go  back  to  the  previous  example  and  find  out  where  each  step  was  done. 
Example  10.3   Fibonacci  numbers  continued      Setting  y  =  x  in  (10. 1)  gives  1/(1 -a;- a;^), 

which  we'll  call  H(x).  This  is  nearly  F(x)  =  (l+x)/(l  —  x  —  x^)  of  the  previous  example,  suggesting 
that  there  is  a  connection  between  binomial  coefficients  and  Fibonacci  numbers.  Let's  explore  this. 

Writing  F(x)/{1  +  a:)  =  H{x)  is  not  a  good  idea  since  the  coefficient  of  x"  on  the  left  side  is 
Fn  —  Fn-i  +  Fn-2  —  •  •  •  and  we'd  like  to  find  a  simpler  connection  if  we  can.  Writing  the  equation 

as  (1  +  x)H{x)  —  F{x)  is  better  since  the  coefRcient  of      on  the  left  side  is  just  hn  +  hn-i- 

It  would  be  even  better  if  we  could  avoid  the  factor  of  (1  +  a;)  and  have  a  monomial  instead,  since 
then  we  would  not  have  to  add  two  terms  together.  You  might  like  to  try  to  find  something  like  that. 
After  some  work,  wo  found  that  1  +  xF(x)  =  H(x),  which  is  easily  verified  by  using  the  formulas  for 
H(x)  and  F{x).  You  should  convince  yourself  that  for  n  >  0  the  coefficient  of  x"  on  the  left  side  is 
Fn-i  and  so  Fn  —  hn-\-i.  In  fact,  some  people  call  1,1,2,3,...  the  Fibonacci  numbers  and  then  hn  is 
the  nth  Fibonacci  number  and  1/(1  —  a;  —  x^)  is  the  generating  function  for  the  Fibonacci  numbers. 
Still  others  call  0, 1, 1, 2, 3, . . .  the  Fibonacci  numbers  and  then  x/(l  —  x  —  x'^)  is  the  generating 
function  for  them.  Anyway,  with  aj^i  =  (-1) ,  our  Fibonacci  number  F„  is  the  coefficient  of  in 
H(x).  By  (10.1), 

oo      oo     /  -\ 

^(^)  =  EE  -V'^^- 

j=0  i=o  ^  ^ 

Note  that  the  coefficient  of  x""*"^  on  the  right  side  is  the  sum  of  (^)  over  all  nonncgativc  i  and  j  such 
that  i  +  j  =  n  +  1.  Hence  F„  =  J2"^q  ^^^^^  is  such  a  simple  expression  that  it  should  have 

a  direct  proof.  We  leave  that  as  an  exercise.  Q 


278       Chapter  10    Ordinary  Generating  Functions 


Example  10.4  The  worst  case  time  for  merge  sorting  Let  M{n)  be  the  maximum  number 
of  comparisons  needed  to  merge  sort  a  list  of  n  items.  (Merge  sorting  was  discussed  in  Example  7.13 
and  elsewhere.)  The  best  way  to  do  a  merge  sort  is  to  split  the  list  as  evenly  as  possible.  If  n  is 

even,  wc  can  divide  the  list  exactly  in  half.  It  takes  at  most  M{n/2)  comparisons  to  merge  sort  each 
of  the  two  halves  and  then  at  most  n  —  1  <  n  comparisons  to  merge  the  two  resulting  lists.  Thus 
M{n)  <  n  +  2M(n/2).  We'd  like  to  use  this  to  define  a  recursion,  but  there's  a  problem:  n/2  may 
not  be  even. 

How  can  we  avoid  this?  We  can  just  look  at  those  values  of  n  which  are  powers  of  2.  For  example, 
the  fact  that  M(l)  =  0  gives  us 

M(8)  <  8  +  2M(4)  <  8  +  2(4  +  2M(2)) 

<  8  +  2(^4  +  2(2  +  2M(l)))  =  8  +  2(4  +  4)  =  24. 

How  can  we  set  up  a  recursion  that  only  looks  at  values  of  n  which  are  a  power  of  2?  We  let 
mk  =M(2'^).  Then 

mo  =  M(l)  =  0  and  nik  =  M(2'=)  <  2'=  +  2M(2''"^)  =  2*^  +  2mk-i. 

So  far  we  have  only  talked  about  solving  recursive  relations  that  involve  equality,  but  this  is  an 
inequality.  What  can  we  do  about  that? 
If  we  define  Ck  by 

Co  =  0    and    Cfe  =  2*  +  2ck-i  for  A;  >  0,  10.16 

then  it  follows  that       <  c^.  We'll  solve  (10.16)  and  so  get  a  bound  for  m,k  =  Al{2^). 

Before  calculating  the  general  solution,  it  may  be  useful  to  use  the  recursion  to  calculate  a  few 
values.  This  might  lead  us  to  guess  what  the  solution  is.  Even  if  we  can't  guess  the  solution,  we'll 
have  some  special  cases  of  the  general  solution  available  so  that  we'll  be  able  to  partially  check  the 
general  solution  when  we  finally  get  it.  It's  a  good  idea  to  get  in  the  habit  of  making  such  checks 
because  it  is  very  easy  to  make  algebra  errors  when  manipulating  generating  functions. 

Prom  (10.16),  the  first  few  values  of  Cfe  are 

Co  =  0,   ci  =  2,   C2  =  2  ■  2^  =  2^,   C3  =  3  •  2^  and  C4  =  4  ■  2^. 

This  strongly  suggests  that  Cfe  =  fc2'^.  You  should  veriiy  that  this  is  correct  by  using  (10.16)  and 
induction. 

Since  we  have  the  answer,  why  bother  with  generating  functions?  We  want  to  study  generating 
function  techniques  so  that  you  can  use  them  in  situations  where  you  can't  guess  the  answer.  This 
problem  is  a  rather  simple  one,  so  the  algebra  won't  obscure  the  use  of  the  techniques. 

For  Step  1,  rewrite  (10.16)  as 

Cfe  =  2*=  +  2cfe_i  +  afe  for  A;  >  0, 

where  Cfe  =  0  for  fc  <  0,  ao  =  —1,  and  a„  =  0  for  n  >  0.  Now 

00 

C{x)  =  Y^CkX^ 

fe=0 

00 

=  ^(2'=  +  2cfe_i+afc)x'= 
fe=o 

00  00 
=  ^{2xf2x^ck-ix^-^  -  I 

fe=0  fe=0 


This  is  Step  2. 
This  is  Step  3. 

This  is  Step  4. 
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For  Step  5  we  have  C{x)  =  2a;/(l  —  2a;)^.  Partial  fractions  (Step  6)  leads  to 

=  ra-T^  =  i:(-;)(-2.,'-i:m'. 

Thus  Cfe  =  2*^       1)*^  (^^)  —  1^  =  k2''.  Hence  M(n)  <  nlog2n  when  n  is  a  power  of  2.  How  good  is 

this  bound?  What  happens  when  n  is  not  a  power  of  2?  It  turns  out  that  nlog2n  is  a  fairly  good 
estimate  for  M(n)  for  all  n,  but  we  won't  prove  it.  D 

Perhaps  you've  noticed  that  when  we  obtain  a  rational  function  (i.e.,  a  quotient  of  two  polyno- 
mials) as  a  generating  function,  the  denominator  is,  in  some  sense,  the  important  part.  We  can  state 
this  more  precisely:  For  rational  generating  functions,  the  recursion  determines  the  denominator  and 
the  initial  conditions  interacting  with  the  recursion  determine  the  numerator.  No  proof  of  this  claim 
will  be  given  here.  A  related  observation  is  that,  if  we  have  the  same  denominators  for  two  rational 
generating  functions  A{x)  and  B{x)  that  have  been  reduced  to  lowest  terms,  then  the  coefficients 
a„  and  6„  have  roughly  the  same  rate  of  growth  for  large  n;  i.e.,  we  usually  have  a„  =  0(6„).* 

Example  10.5   Counting  unlabeled  full  binary  RP-trees    Let  6„  be  the  number  of  unlabeled 

full  binary  RP-trccs  with  n  leaves.  By  Example  9.4  (p.  251),  the  number  of  such  trees  is  the  Catalan 
number  C„_i.  See  Example  1.13  (p.  15)  for  more  examples  of  things  that  are  counted  by  the  Catalan 
numbers. 

The  recursion 

n-l 

bn  =  Yl  bkbn-k    if  n  >  1  10.17 

fe=l 

with  6i  =  1  was  derived  as  (9.3).  Recall  that  6i  =  1  and  feo  was  not  defined.  Let's  use  our  procedure 
to  find  hn-  Here  it  is,  step  by  step. 

1.  Since  (10.17)  is  nearly  a  convolution,  we  define  &o  =  0  to  make  it  a  convolution: 

n 

bn    =    y^,bkbn-k  +  dn, 
k=0 

where  ai  —  1  and  a„  =  0  for  n  ^  1. 

2.  Let  B{x)  =  E„>o^n'^"- 

3.  B{x)  =  J2n>o  ELo  bkbn-kx"  +  X. 

4.  By  the  formula  for  convolutions,  we  now  have 

B{x)  =  B{x)B{x)  +x.  10.18 


5.  The  quadratic  equation  B  =  x  +  B^  has  the  solution  B  =  {1±  y/1  —  Ax)/2.  Since  B{Q)  =  bo  =  0, 
the  minus  sign  is  the  correct  choice.  Thus 

1- VT^4i 
B{x)  =  . 

6.  By  Exercise  10.1.4, 

'^\.n    „.x,„„„   fr\  _r{r-l)---{r-n  +  l) 


{1+zy  =        f'')'^"'  ^^^^^ 


nj  n\ 


*  This  notation  is  discussed  in  Appendix  B.  It  means  there  exist  positive  constants  A  and  B  such 
that  Attn  <bn<  Ban- 
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Now  for  some  algebra.  With  n>0,  r  =  l/2  and  z  =  — 4x  we  obtain 

^|(^-l)(|-2)...(l-n+l)^^   ^  . 


2(2")J   ^,   ^2(-2) 

2"-i(n  -         /(-I  +  2)(-l  +  4)  •  ■  •  (-1  +  2n  -  2) 


(n-1)!    y  V 
2-4---(2n-2)\   /I  •  3- • -(271 -3) 


(n-1)!  /V  n\ 
(2n-2)!  l/2n-2^ 


(n  —  l)!n!       n  \  n  —  1 

As  remarked  at  the  beginning  of  the  example,  this  number  is  the  Catalan  number  C„_i.  Thus 


Exercises 


10.2.1.  Solve  the  following  recursions  by  using  generating  functions. 

(a)  oo  =  0,  ai  =  1  and  On  =  5o„_i  —  6o„_2  for  n  >  1. 

(b)  oq  =  ai  =  1  and  a„+i  =  a„  +  6a„_i  for  n  >  0. 

(c)  ao  =  0,  ai  =  02  =  1  and  a-n  =  fln-i  +  an-2  +  2a„_3  for  n  >  2. 

(d)  Oo  =  0  and  On  =  2o„_i  +  n  for  n  >  0. 

10.2.2.  Let  S(ri)  be  the  number  of  moves  needed  to  solve  the  Towers  of  Hanoi  puzzle.  In  Exercise  7.3.9  you 

were  asked  to  show  that  5(1)  =  1  and  S'(n)  =  2S{n  —  1)  +  1  for  n  >  1. 

(a)  Use  this  recursion  to  obtain  the  generating  function  for  S. 

(b)  Use  the  generating  function  to  determine  S{n). 

10.2.3.  Show  without  generating  functions  that  ("^^^  *)  is  the  number  of  n  long  sequences  of  zeroes  and  ones 
with  exactly  i  ones,  none  of  them  adjacent.  Use  this  result  to  prove  the  formula  Fn  =  X^i>o  ("\^~') 
that  was  derived  in  the  Example  10.3  via  generating  functions. 

10.2.4.  Let  Sn  be  the  number  of  n  long  sequences  of  zeroes,  ones  and  twos  with  no  adjacent  ones  and  no 
adjacent  twos.  Let  so  =  1;  i-e.,  there  is  one  empty  sequence. 

(a)  Let  k  be  the  position  of  the  last  zero  in  such  a  sequence.  If  there  is  no  zero,  set  fe  =  0.  Show  that 
the  last  n  —  fe  elements  in  the  sequence  consist  of  an  alternating  pattern  of  ones  and  twos  and 
that  the  only  restriction  on  the  first  fe  —  1  elements  in  the  sequence  is  that  there  be  no  adjacent 
ones  and  no  adjacent  twos. 

(b)  By  considering  all  possibilities  for  k  in  (a),  conclude  that,  for  n  >  0, 

Sn   =   2  +  2so  +  2siH  +2Sn-2+Sn-l- 

(c)  Use  the  convolution  formula  to  deduce 

Six)  =  {l  +  2x  +  2x^  +  2x^ +  ■■■){! +  sox  +  six^  +  S2X^  +  ---)  =  (^1  +  -^^^  {1  +  xSix)). 

(d)  Conclude  that  S{x)  =  (1  +  x)/{l  -2x-  x^). 

(e)  Find  a  formula  for  Sn  and  check  it  for  n  =  0,  1,2. 

(f)  Show  that  Sn  is  the  integer  closest  to  (1  +  v^)"+V2. 
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10.2.5.  The  usual  method  for  multiplying  two  polynomials  of  degree  n  —  1,  say 

Pi(x)  =  00,1  +  oi,ia;  H  \- an-i,ix"~^    and    -P2(a;)  =  oo,2  +  01,2a;  H  h  a„_i,2a;"~^ 

requires  multiplications  to  form  the  products  a,, 10^,2  for  0  <  i,j  <  n.  These  arc  added  together 
in  the  appropriate  way  to  form  the  2n  —  1  sums  that  constitute  the  coefEcients  of  the  product 
Pi{x)P2{x).  There  is  a  less  direct  method  that  requires  less  multiplications.  For  simplicity,  suppose 
that  n  =  2m. 

•  First,  spht  the  polynomials  in  "half":  Pi{x)  =  Li{x)  +  x™Hi{x),  where  L,  and  Hi  have 
degree  at  most  m  —  1. 

•  Second,  let  A  =  H1H2,  B  =  L1L2  and  C  =  {Hi+  Li)(i^2  +  L2). 

•  Third,  note  that  P1P2  =  Ax'^'^  +  B  +  (C  -  A  -  B)x"'. 

(a)  Prove  that  the  formula  for  P1P2  is  correct. 

(b)  Let  M{n)  be  the  least  number  of  multiplications  we  need  in  a  general  purpose  algorithm  for 

multiplying  two  polynomials  of  degree  n  —  1.  show  that  M(2m)  <  3M(m). 

(c)  Use  the  previous  result  to  derive  an  upper  bound  for  M{n)  when  n  is  a  power  of  2  that  is  better 
than  -n? .  (Your  answer  should  be  M[n)  <  n'^  where  c  =  1.58  ■  ■  ■.)  How  does  this  bound  compare 
with      when  n^2^^  =  1024? 

Your  bound  will  give  a  bound  for  all  n  since,  if  n  <  2'^,  we  can  fill  the  polynomials  out  to 
degree  2^  by  introducing  high  degree  terms  with  zero  coefficients.  This  gives  M{n)  <  M{2^). 

(d)  Show  how  the  method  used  to  obtain  the  bound  multiplies  1  +  2x  —  x'^  +  3x^  and  5  +  2a;  — 
x^ 

*(e)  It  may  be  objected  that  our  method  could  lead  to  such  a  large  number  of  additions  and 
subtractions  that  the  savings  in  multiplication  may  be  lost.  Does  this  happen?  Justify  your 
answer. 

10.2.6.  Let  tn  be  the  number  of  ri-vertex  unlabeled  binary  RP-trees.  (Each  vertex  has  0,  1  or  2  children.) 

(a)  Derive  the  recursion 

n-l 

ii  =  1    and    tn+i  =  tn  +  ^  tktn-k    for  n  >  0. 

(b)  With  to  =  0,  derive  an  equation  for  the  generating  function  T{x)  =  X^„>o*"a;". 

(c)  Solve  the  equation  in  (b)  to  obtain 

^,  ,        1  -  a;  -  Vl  -  2a;  -  3a;^ 
=   2x  

and  explain  the  choice  of  sign  before  the  square  root. 

10.2.7.  Let  ci, . . .  ,Cfe  be  arbitrary  real  numbers.  If  you  are  familiar  with  partial  fractions,  explain  why  the 
solution  to  the  recursion  an  =  cia„_i  +  •  •  •  +  C]^ak-k  has  the  form  an  =  X^^i  Pi(P')r'!i  for  all 
sufficiently  large  n,  where  Pj(n)  is  a  polynomial  of  degree  less  than  dj,  the  rj  are  all  different,  and 

m 

1-cix  Cfca;''  =  ]^(l-ria;)*. 

4=1 

How  can  the  polynomials  Pn{n)  be  found  without  using  partial  fractions? 
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10.3    Manipulating  Generating  Functions 


Almost  anything  we  do  with  generating  functions  can  be  regarded  as  manipulation,  so  what  does 
the  title  of  this  section  refer  to?  We  mean  the  use  of  tools  from  algebra  and  calculus  to  obtain 
information  from  generating  functions.  We've  already  seen  some  examples  of  one  tool  being  used: 
partial  fractions.  In  this  section  we'll  focus  on  two  others;  (i)  the  manipulation  of  generating  functions 
to  obtain,  when  possible,  simple  recursions  and  (ii)  the  interplay  of  derivatives  with  generating 
functions.  Some  familiarity  with  calculus  is  required.  The  results  in  this  section  are  used  some  in 
later  sections,  but  they  are  not  essential  for  understanding  the  concepts  introduced  there. 

Obtaining  Recursions 


Suppose  we  have  an  equation  that  determines  a  generating  function  B{x);  for  example, 
B{x)  =  The  basic  idea  for  obtaining  a  recursion  for  B{x)  is  to  rewrite  the  equation 

so  that  B{x)  appears  in  expressions  that  are  simple  and  so  that  the  remaining  expressions  are  easy 
to  expand  in  power  series.  Once  a  simple  form  has  been  found,  equate  coefficients  of  a;"  on  the  two 
sides  of  the  equation.  We'll  explore  this  idea  here. 

Example  10.6  Rational  functions  and  recursions  Suppose  that  B{x)  =  P{x)/Q{x)  where 
P{x)  and  Q{x)  are  polynomials.  Expressions  that  involve  division  are  usually  not  easy  to  expand 
unless  the  divisor  is  a  product  of  linear  factors  with  integer  coefficients.  Thus,  we  would  usually 
rewrite  our  equation  as  Q{x)B{x)  =  P{x)  and  then  equate  coefficients.  This  gives  us  a  recursion  for 
the  6j's  which  is  linear  and  has  constant  coefficients. 

The  description  of  the  procedure  is  a  bit  vague,  so  let's  look  at  an  example.  When  we  study  sys- 
tems of  recursions  in  the  next  chapter,  we  will  show  that  the  number  of  ways  to  place  nonoverlapping 
dominoes  on  a  2  by  n  board  has  the  generating  function 

^^^)  =  l-3a;-a;2+a;3- 

Thus  P{x)  =  1  —  X  and  Q{x)  =  1  —  3x  —  x"^  +  x^.  Using  our  plan,  we  have 

(1  -  3a;  -  a;^  +  x^)C(a;)  =  1  -  a;.  10.19 

There  are  now  various  ways  we  can  proceed: 

Keep  all  subscripts  nonnegative:  When  n  >  3,  the  coefficient  of  x"  on  the  right  side  is  0  and 
the  coefficient  on  the  left  side  is  Cn  —  -icn-i  —  c„_2  +  c„_3,  so  all  the  subscripts  are  nonnegative. 
Rearranging  this, 

Cn  =  3c„_i  +  c„_2  -  c„_3    for  n  >  3. 

The  values  of  ag,  ai  and  a2  arc  given  by  initial  conditions.  Looking  at  the  coeflacients  of  a;*^,  x^  and 
x"^  on  both  sides  of  (10.19),  we  have 

ao  =  1       cii  —  3ao  =  —  1  —  3ai  —  ao  =  0. 

Solving  we  have  ao  =  1,  ai  =  2  and  a2  =  7.  (You  might  want  to  try  deriving  the  recursion  directly. 
It's  not  easy,  but  it's  not  an  unreasonable  problem  for  you  at  this  time.) 

Allow  negative  subscripts:  We  now  allow  negative  subscripts,  with  the  understanding  that  a„  =  0 

if  n  <  0.  Proceeding  as  above,  we  get  c„  —  3c„_i  —  c„_2  +  c„_3  =  0  provided  n  >  2.  Thus  we  get 
the  same  recursion,  but  now  n  >  2  and  the  initial  conditions  are  only  ao  =  1  and  ai  =  2  since  as  is 
given  by  the  recursion. 
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Avoid  initial  conditions:  Now  we  not  only  allow  negative  subscripts,  we  also  do  not  restrict  n. 
From  (10.19)  we  have 

c„  -  3c„_i  -  Cn-2  +  Cn-3  =  bn,    where  bn  =  [a;"]  (1  -  x). 

Thus  we  have  the  recursion 

Cn  =  3c„_i  +  C„_2  -  c„_3  +  bn    for  n  >  0, 
where  60  =  1)     =  —  1  and  6„  =  0  otherwise.  Q 

The  ideas  are  not  limited  to  ratios  of  polynomials,  but  then  it's  not  always  clear  how  to  proceed. 
In  the  next  example,  we  use  the  fact  that       has  a  simple  power  series. 


Example  10.7    Derangements     In  the  next  chapter,  we  obtain,  as  (11.17)  the  formula 

00  _^ 

D(x)  =  y£)„a;"/n!  =  ;  10.20 

1  —  X 


ra=0 


in  other  words,  e  ^/(l  —  x)  is  the  ordinary  generating  function  for  the  numbers  d„  =  Dn/n\.  We 
can  get  rid  of  fractions  in  (10.20)  by  multiplying  by  (1  —  x).  Since 


=  E 


(-l)"a;" 


n      ri\  ' 

n=0 

equating  coefficients  of  a;"  on  both  sides  of  (1  —  x)D{x)  =       gives  us 

Dn         Dn-1      ^  (-1)" 

n\      [n  —  1)!  n! 

Rearranging  leads  to  the  recursion  £)„  =  nDn-i  +  (—!)"•  A  direct  combinatorial  proof  of  this 
recursion  is  known,  but  it  is  difficult.  Q 


One  method  for  solving  a  differential  equation  is  to  write  the  unknown  function  as  a  power 
series  y{x)  =  ^a„a;",  use  the  differential  equation  to  obtain  a  recursion  for  the  a„,  and  finally 
use  the  recursion  to  obtain  information  about  the  a„'s  and  hence  y{x).  Here  we  proceed  differently. 
Sometimes  a  recursion  may  lead  to  a  differential  equation  which  can  solved  to  obtain  the  generating 
function.  Sometimes  a  differential  equation  can  be  found  for  a  known  generating  function  and  then 
be  used  to  obtain  a  recursion.  We  consider  the  latter  approach  in  the  next  example.  What  sort 
of  differential  equation  should  we  look  for?  Linear  equations  with  polynomial  coefficients  give  the 
simplest  recursions. 
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Example  10.8  A  recursion  for  unlabeled  full  binary  RP-trees  In  Example  10.5  wc  found 
that  the  generating  function  for  unlabeled  full  binary  RP-trees  is  B(x)  =  We  then  obtained 

an  explicit  formula  for  6„  by  expanding  \/l  —  4x  in  a  power  series.  Instead,  we  could  obtain  a 
differential  equation  which  would  lead  to  a  recursion. 

Wc  can  proceed  in  various  ways  to  obtain  a  simple  differential  equation.  One  is  to  observe  that 
2B{x)  -  1  =  -(1  -  4x)^/2  differentiate  both  sides  to  obtain  2B'{x)  =  2(1  -  Ax)'^/^.  Multiply 
by  1  -  4x: 

2(1  -  Ax)B'{x)  =  2(1  -  4a;)i/2  =  -{2B{x)  -  1). 
Thus  2B'{x)  —  8xB'{x)  +  2B{x)  =  1.  Replacing  B{x)  by  its  power  series  we  obtain 

^2n6„a;"-^ -^8n6„a;"  +  ^26„a;"  =  1. 

Replacing  the  first  sum  by     2(n  +  l)6„+ia;"  and  equating  coefficients  of  x"  gives 

2(n  +  l)6„+i  -  8nbn  +  26„  =  0    for  n  >  0. 

After  some  rearrangement,  bn+i  =  (4n  —  2)6„/(n  + 1)  for  n>  0.  We  already  know  that  bi  =  1,  so  we 
have  the  initial  condition  for  the  recursion.  This  recursion  was  obtained  in  Exercise  9.3.13  (p.  266) 
by  a  counting  argument.  Q 


Derivatives,  Averages  and  Probability 


The  fact  that  xA'{x)  ~  J2  nunx"  can  be  quite  useful  in  obtaining  information  about  averages.  We'll 
explain  how  this  works  and  then  look  at  some  examples. 

Let  An  be  a  set  of  objects  of  size  n;  for  example,  some  kind  of  n-long  sequences  or  some  kind  of 
n-vertex  trees.  For  each  n,  make  An  into  a  probability  space  using  the  uniform  distribution: 

Pr(a)  =  -j-T— I  for  all  a  G  An- 

{•An  I 

(Probability  is  discussed  in  Appendix  C.)  Suppose  that  for  each  n  we  have  a  random  variable  X„  on 

An  that  counts  something;  for  example,  the  number  of  ones  in  a  sequence  or  the  number  of  leaves 
on  a  tree.  The  average  value  (average  number  of  ones  or  average  number  of  leaves)  is  then  E(X„). 

Now  let's  look  at  this  in  generating  function  terms.  Let  an,k  be  the  number  of  a  €  An  with 
Xn{a)  =  fc;  for  example,  the  number  of  n-long  sequences  with  k  ones  or  the  number  of  n-vertex  trees 
with  k  leaves.  Let  A{x,  y)  be  the  generating  function  an,kx'^y'^-  By  the  definition  of  expectation 
and  simple  algebra, 

E(X„)    =   Y.k^.{Xn  =  k)    =   Y^k^    =    ^fl^  = 

Let's  look  at  the  two  sums  in  the  last  fraction. 

Since  [a;"]  A{x,  y)  =  J2k  ^n^kv'',  T,k  ^n,k  =  [x'^]A{x,  1). 

Since  [x"]            =  Efe  fcan,fcy'-\  Efe  kan,k  =  [x^]Ay{x,  1), 
where  Ay  stands  for  dA/dy.  Putting  this  all  together, 

_  [X-]Ay{x,l) 

^^^"^  -  [xn]A{xA)- 

We  can  use  the  same  idea  to  compute  variance.  Recall  that  var(X„)  =  E(X^)  —  E(X„)^.  Since 
(10.21)  tells  us  how  to  compute  E(X„),  all  we  need  is  a  formula  for  E(X^).  This  is  just  like  the 
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previous  derivation  except  we  need  factors  of  fc^  multiplying  a„,fe.  We  can  get  this  by  differentiating 
twice: 

d{yAy{x,y)) 


^  ]  k  Cln,k  — 


=  [X^]{Ayy{X,l)  +  Ay{X,l)).  10.22 
J/  =  l 


This  discussion  has  all  been  rather  abstract.  Let's  apply  it. 

Example  10.9    Fibonacci  sequences    What  is  the  average  number  of  ones  in  an  n  long  sequence 

of  zeroes  and  ones  containing  no  adjacent  ones?  Wc  studied  these  sequences  in  Example  10.2  (p.  275), 
where  we  used  the  notation  i^„.  To  be  more  in  keeping  with  the  previous  discussion,  let  Let  fn^k  be 
the  number  of  n  long  sequences  containing  exactly  k  ones.  We  need  F{x,  y)  =       k  In,kX^y'^- 

In  Example  10.16  we'll  see  how  to  compute  F{x,y)  quickly,  but  for  now  the  only  tool  we  have 
is  recursions,  so  it  will  take  a  bit  longer.  You  should  be  able  to  extend  the  argument  used  to  derive 
the  recursion  (10.12)  to  show  that 

fn,k  =  fn-i,k  +  /n-2,fe-i    for    n  >  2,  10.23 

provided  we  set  fn,k  =  0  when  fc  <  0.  Let  -F„(y)  =  J2k  fn,ky'^  and  sum  y'^  times  (10.23)  over  all  k 
to  obtain 

Fr,{y)  =  Fn-i{y)+yFn-2{y)    forn  >  2.  10.24 

For  n  =  0  we  have  only  the  empty  sequence  and  for  n  =  1  wc  have  the  two  sequences  0  and  1.  Thus, 
the  initial  conditions  for  (10.24)  are  -Fb(2/)  =  1  and  Fi{y)  =  1  +  y.  Multiplying  (10.24)  by  a;"  and 
summing  over  n  >  2,  we  obtain 

F{x,y)  -  Fo{y)  -  xFi{y)  =  x{F{x,y)  -  Fo{y))  +  x^yF{x,y). 

Thus 


F{x,y)  =  .   ^^''^  .  10.25 
1  —  X  —  x^y 


We  are  now  ready  to  use  (10.21).  Prom  (10.25), 

w  \  -  x{l  -  X  -  x'^y)  -  {I  +  xy){-x'^) 
J^y{X,y)  — 


(1  —  a;  —  x'^y)'^  (1  —  a;  —  x'^y)'^ 

and  so 

Fy{X,l) 


(1  —  a;  —  a;^)^  ' 
Thus 


1 


X- 


(1  —  a;  —  a;2)2 ' 


This  can  be  expanded  by  partial  fractions  in  various  ways.  The  easiest  method  is  probably  to  use 
the  ideas  and  formulas  in  Appendix  D  (p.  387),  which  we  now  do.  With  a,b  =  (1  ±  •\/5)/2,  as  in 
Example  10.2,  we  have 

1  _  1 

(l-a;-a;2)2  ~  (1  -  aa;)2(l  -  6a;)2 ' 

We  make  use  of  the  relations 

a  +  b  =  l  ab  = —1  and  a  —  b=V5. 
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Here  are  the  calculations 

2 


1  /  a/TS 


{1  —  ax)"^  {1  —  bx)"^        \l  —  ax     1  —  bxl 

aV5  2ab/5 


(l-ax)2      {l-ax)il-bx)      {1  -  bx)^ 
aV5     _^  2a/5V5     26/5^5  _^  675 


Thus 


(1  —  axy      1  —  ax       1  —  bx      (1  —  bxy 


5\/5 

na"+i      2a"       25"  n6"+i 


5        5^/5     5V5  5 

Since  |6|  <  .62,  the  last  two  terms  in  this  expression  are  fairly  small.  In  fact,  we  will  show  that  Wn 
is  the  integer  closest  to 

a" 

—  (an +  2/^/5). 

Using  the  expression  (10.15)  for  J2k  fn,k,  the  average  number  of  ones  is  very  close  to 

n  2 
+ 


a-v/5  ' 

We  must  prove  our  claim  about  the  smallness  of  the  terms  involving  b.  It  suffices  to  show  that 
their  sum  is  less  than  1/2.  Since  |6|  =  (\/5—  l)/2  <  1,  we  have 

2161"  26 

<  ^  <  0.12. 
5^5  5^5 

The  term  n|6|"+^/5  is  a  bit  more  complicated.  We  study  it  as  a  function  of  n  to  find  its  maximum. 
Its  derivative  with  respect  to  n  is 

|6|"+i     nln|6|  |6|"+i        |6|"+S,  , 
^-^  +   =  -L-L— (l  +  nln|6|). 

D  D  0 

Since  —0.25  <  ln|6|  <  —0.2,  this  is  positive  for  n  <  4  and  negative  for  n  >  5.  It  follows  that  the 
term  achieves  its  maximum  at  n  =  4  or  at  n  =  5.  The  values  of  these  two  terms  are 

4|6|V5  <  |6|^  <  0.1  and  5|6|V5  <  |6|^  <  0.1, 

proving  our  claim.  Q 

Example  10.10  Leaves  in  trees  What  can  we  say  about  the  number  of  leaves  in  n-vertex 
unlabeled  RP-trees?  We'll  study  the  average  number  of  leaves  and  the  variance  using  (10.21)  and 
(10.22). 

Let  tji^k  be  the  number  of  unlabeled  RP-trees  having  n  vertices  and  k  leaves  and  let  T{x,  y)  be 
J2n  k^^ykX^y'^-  Using  tools  at  our  disposal,  it  is  not  easy  to  work  out  the  generating  function  for 
T{x,y).  On  the  other  hand,  after  you  have  read  the  next  section,  you  should  be  able  to  show  that 

T{x,y)  =  xy  +  xT{x,y)+x{T{x,y)f +  ---  +  x{T{x,y)y +  ■■■, 

where  x{T{x,y)y  comes  from  building  trees  whoso  roots  have  degree  i.  We'll  assume  this  has  been 
done.  Summing  the  geometric  series  in  (10.25),  we  have 

xT{x,y) 


T{x,y)  =  xy  + 


1-T{x,y)- 
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Clearing  of  fractions  and  rearranging: 

{T{x,y)f -{l-x  +  xy)T{x,y)  +  xy  =  0, 
a  quadratic  equation  in  T(,  y)  whose  solution  is 


1  —  X  +  xy  ±  ^/{l  —  X  +  xy)"^  —  Axy        1  —  x  +  xy  ±  ^/{l  +  x  —  xy)-^  —  Ax 
T{x,y)  =  -  =  -  . 

Do  we  use  the  plus  sign  or  the  minus  sign?  Since  there  are  no  trees  with  no  vertices  io,o  =  0.  On  the 
other  hand, 

io,o  =  T(0,0)  = 

and  so  we  want  the  minus  sign.  We  finally  have  T{x,  y).  Let's  multiply  be  2  to  get  rid  of  the  annoying 
fraction: 

2T{x,y)  =  l-x  +  xy-{{l  +  x-xyf -AxY''^. 
Differentiating  with  respect  to  y,  we  have 

2Ty{x,y)  =  x  +  x{l  +  x-xy){{l  +  x-xyf -4:x)~^^^ 

and 

2Tyy{x,  y)  =  -x^{{l  +  x-  xyf  -  4a;)"^^^  +  x^{l  +  x-  xyf{{l  +  x  -  xyf  -  4x)"^^^ 

Thus 

2r(.T,l)  =  1  -  (1  -4a;)i/2^ 
2Ty{x,l)  =  x  +  -4,x)^i/2, 
2Tyy{x,l)  =  -x^{l-Ax)-^^^+x^{l-4x)-^/^. 

For  n  >  2  we  have 

^nA/2 


2[a;"]T(x,l)  =  -(-4)^ 


n 


2[x'']Ty{x,l)  =  [x"-i](l-4x)-i/2  =  (-4)"-!' 


n-ir 


2[x'']Tyy{x,l)  =  -[a;"-2](l-4a;)-i/2  +  [a;"-2](l-4a;)-3/2 


=  -(-^)""\n-2j^(-^)  Vn-2 
Let  Xn  be  the  number  of  leaves  in  a  random  n- vertex  tree  and  suppose  n>  2.  Then 

2[x^]Ty{x,l)  _  il"!^) 


E(X„)  = 


(-l/2)(-3/2)...(-l/2-(n-2)) 


(n-iy.  n\ 


n 


^(l/2)(-l/2)-.-(l/2-(n-l))  4(1/2)  (n-1)!  2 

n! 
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and,  recalhng  (10.22), 


^K)  =      r  r  + 


2[x"]r(x,l)         2[a;"]T(x,l)         V4^(\f)  '^'^i^n 
(_l/2)...(-l/2-(n-3))  (-3/2)---(-3/2-(n-3)) 
(n-2)!  {n-2)\ 


n 

+ 


^2 


(1/2) . . .  (1/2  -  (n  -  1))  .  (1/2) . . .  (1/2  -  (n  -  1))  2 


7,1 


n\  n 
+ 


42(l/2)(l/2-(n-l))  (n-2)!  42(l/2)(-l/2)  n!  2 
n{n  ~  1)  _|_  n{n  —  1)  ^  n 


Thus 


4(3 -2n)  4  2 

n?  +  n        n{n  —  1) 
4  4(2n  -  3)  ■ 

var(X„)  =  E(X^)  -  (E(X„))2  = 


71,2  +  n      n(n  —  1) 


n 


4         4(2n  -  3)  y  4 
n(n-l)    _  n((2n-3)  -  (n- 1))   _    n(n  -  2) 


4     4(2n-3)  4(2n-3)  4(2n-3)' 

For  large  n  this  is  nearly  n/8. 

We've  shown  that  the  average  number  of  leaves  in  an  RP-tree  is  n/2  and  the  variance  in  the 
number  of  leaves  is  about  n/8.  By  Chebyshev's  inequality  (C.3)  (p.  385),  it  follows  that,  in  most 
large  RP-trees,  about  half  the  vertices  are  leaves.  More  precisely: 

|(muub(>r  of  lcu\"(w)  —  ;i/2| 
It  IS  unlikely  that   will  be  large. 


By  Exercise  5.4.8  (p.  140),  every  A''- vertex  full  binary  tree  has  exactly  leaves,  very  slightly 
larger  than  the  average  over  all  trees.  Since  a  tree  that  has  many  edges  out  of  nonleaf  vertices  will 
have  more  leaves,  it  would  seem  that  a  full  binary  tree  should  have  relatively  few  leaves.  What  is 
going  on?  Random  RP-trees  must  have  many  nonleaf  vertices  with  only  one  child,  counterbalancing 
those  with  many  children  so  that  the  average  comes  out  to  be  nearly  two.  Q 

*Exannple  10.11    Average  distance  to  a  leaf   What  is  the  average  distance  to  the  leaf  in  a 
random  full  binary  RP-tree? 

Before  answering  this  question,  we  need  to  say  precisely  what  it  means.  If  T  is  an  unlabeled  full 
binary  RP-tree,  let  d(T)  be  the  sum  of  the  distances  from  the  root  to  each  of  the  leaves  of  the  tree. 
(The  distance  from  the  root  to  a  leaf  is  the  number  of  edges  on  the  unique  path  joining  them.)  We 
want  the  average  value  of  d{T)/n  over  all  unlabeled  n  leaf  full  binary  RP-trees.  This  average  can 
be  important  because  many  algorithms  involve  traversing  such  trees  from  the  root  to  a  leaf  and  the 
time  required  is  proportional  to  the  distance. 

Let  D{x)  =  Y,d{T)x'^^'^\  where  the  sum  ranges  over  all  unlabeled  full  binary  RP-trees  T  and 
w{T)  is  the  number  of  leaves  in  T.  Let  B{x)  =  ^x'^^'^\  By  Example  10.5 

^,  ,        l-s/T^       ,  ,  l/i\.  l/2n-2 

Suppose  that  T  has  more  than  one  leaf.  Let  Ti  and  T2  be  the  two  principal  subtrees  of  T;  that 
is,  the  two  trees  whose  roots  are  the  sons  of  the  root  of  T.  You  should  be  able  to  show  that 


d{T)  =  w{T)  +  d{Ti)+d{T2). 
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Multiply  this  by  x^^"^^  and  sum  over  all  T  with  more  than  one  leaf.  Since  d{»)  =  0  and 

w{T)  =  w{Ti)  +  w(T2),  we  have 

D{x)  =  ^n&„a;"+  ^  d{T,)x^'-^^^+^^^'^  +  ^  diT2)x"'^^'^+'"^^'^ 

n>l  Ti,T2  Ti,T2 

=  xB'{x)-x  +  D{x)B{x)+B{x)D{x). 

Thus 

xB'{x)  —  X  1       f      X  \  X  X 

"  1  -  2B{x)  "  7f^v7r^~v  "  T^~VT^' 

It  follows  that 


d„  =  4"-i-  (    _2   l(-4r-i  =  4"-i  +  ^(    ^    l(-4)"  =  4"-i-n6„ 


and  so  the  average  distance  to  a  leaf  is 


4 


n—1  4"-— 1 


Using  Stirling's  formula,  it  can  be  shown  that  this  is  asymptotic  to  y/Trn. 

This  number  is  fairly  small  compared  to  n.  We  could  do  much  better  by  limiting  ourselves  to 
averaging  over  certain  subclasses  of  binary  RP-trees.  For  example,  we  saw  in  Chapter  8  that  if  the 
distances  to  the  leaves  of  the  tree  arc;  all  about  (xjual,  then  the  avcirage  and  largest  distances  are 
both  only  about  log2  n.  Thus,  when  designing  algorithms  that  use  trees  as  data  structures,  restrict- 
ing the  shape  of  the  tree  could  lead  to  significant  savings.  Good  information  storage  and  retrieval 
algorithms  are  designed  on  this  basis.  D 

*Exannple  10.12  The  average  time  for  Quicksort    We  want  to  find  out  how  long  it  takes  to 

sort  a  list  using  Quicksort.  Quicksort  was  discussed  briefly  in  Chapter  8.  We'll  review  it  here.  Given 
a  list  ai,  02, . . . ,  a„.  Quicksort  selects  an  element  x,  divides  the  list  into  two  parts  (greater  and  less 
than  x)  and  sorts  each  part  by  calling  itself.  There  are  two  problems.  First,  we  haven't  been  specific 
enough  in  our  description.  Second,  the  time  Quicksort  takes  depends  on  the  order  of  the  list  and 
the  way  x  is  chosen  at  each  call.  To  avoid  the  dependence  on  order,  we  will  average  over  all  possi- 
ble arrangements.  We  now  give  a  more  specific  description  using  x  =  ai.  Given  a  list  ai,  02, . . . ,  a„ 
of  distinct  elements,  we  create  a  new  list  si,S2,  ■  ■  ■  ,Sn  with  the  following  properties. 

(a)  For  some  1  <  k  <  n,  Sk  =  ai. 

(b)  Si  <  cii  for  i  <  k  and  .s^  >  ai  for  i  >  k. 

(c)  The  relative  order  of  the  elements  in  the  two  sublists  is  the  same  as  in  the  original  list;  i.e.,  if 

and  either  i<j<kork<i<j,  then  p  <  q. 

It  turns  out  that  this  can  be  done  with  n—1  comparisons.  We  now  apply  Quicksort  recursively  to 
si, . . .  ,Sfe_i  and  to  Sfe+i, . . . ,  s„. 

Let  Qn  be  the  average  number  of  comparisons  needed  to  Quicksort  an  n  long  list.  Thus  =  0. 
We  define  go  =  0  for  convenience  later. 

Note  that  k  is  the  position  of  ax  in  the  sorted  list.  Since  the  original  list  is  random,  all  values  of 
k  from  1  to  n  are  equally  likely.  By  analyzing  the  algorithm  carefully,  it  can  be  shown  that  all  order- 
ings  of  si, . . . ,  Sfe_i  are  equally  likely  as  are  all  ordcrings  of  Sk+i,  ■  ■  ■ ,  s„.  (We  will  not  do  this.)  Thus, 
given  k,  it  follows  that  the  average  length  of  time  needed  to  sort  both  Si, . . . ,  Sk-i  and  Sk+i,  •  •  • ,  s„ 
is  qk-l  +  Qn-k- 

Averaging  over  all  possible  values  of  k  and  remembering  to  include  the  original  n—1  compar- 
isons, we  obtain 


Qn    =   n-l  +  (qk-l  +  Qn-k)    =   ^  -  1  +  -         Qj , 

which  is  valid  for  n  >  0. 


n  ^ '  '  n  .  ^ 

fe=i  j=o 
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To  solve  this  recursion  by  generating  functions,  we  should  let  Q{x)  =  "Y^qnX^  and  use  the 
recursion  to  get  a  relation  for  Q{x).  If  we  simply  substitute,  we  obtain 


00/  2  \ 

If  we  try  to  manipulate  this  to  simplify  the  double  sum  over  n  and  j  of  2qjx'^/n,  we  will  run  into 
problems  because  of  the  n  in  the  denominator.  How  can  we  deal  with  this? 

One  approach  would  be  to  multiply  the  original  recursion  by  n  before  wc  use  it.  Another  ap- 
proach, which  it  turns  out  is  equivalent,  is  to  differentiate  (10.26)  with  respect  to  x.  Which  is  better? 
The  latter  is  easier  when  we  have  a  denominator  as  simple  as  n,  but  the  former  may  be  better  when 
we  have  more  complicated  expressions.  We  use  the  latter  approach.  Differentiating  (10.26),  we  have 


x"-i 


Q'{x)  =  ^((n-l)n  +  2^g,)x"-^  =  ^  n(n  -  l)x"-i  +  2  ^  ^ 

n=l  n=l j=0 

k 

where  Q{x)/{1  —  x)  follows  either  by  recognizing  that  we  have  a  convolution  or  by  applying  Exer- 
cise 10.1.6  (p.  274). 

Rearranging,  we  see  that  we  must  solve  the  differential  equation 

Q'{x)  -2{l-x)-^Q{x)  =  2a;(l-x)-^  10.27 

which  is  known  as  a  linear  first  order  differential  equation.  This  can  be  solved  by  standard  methods 
from  the  theory  of  differential  equations.  We  leave  it  as  an  exercise  to  show  that  the  solution  is 

where  the  constant  C  must  be  determined  by  an  initial  condition.  Since  Q{0)  =  qo  =  0,  we  have 
C  =  0. 

Using  the  Taylor  series 

x'' 


-in(i-x)  =  y:- 


k 

and  some  algebra,  one  eventually  obtains 


1 

g„  =  2(n  -M)  ^  -  -  An.  10.29 


k 

k=l 


Again,  details  are  left  as  an  exercise. 

Using  Riemann  sum  approximations,  we  have 


^k    ^  X     ^  ^k' 

k=2  •' ^  k=l 


from  which  it  follows  that  the  summation  in  (10.29)  equals  Inn  +  0(1).  It  follows  that 

Qn  =  2nlnn+0{n)  as  n  ^  00.  10.30 

This  is  not  quite  as  small  as  the  result  n  logj  11  that  we  obtained  for  worst  case  merge  sorting  of  a  list 
of  length  n  =  2*^;  however,  merge  sorting  requires  an  extra  array  but  Quicksort  does  not  because  the 
array  si, . . . ,  s„  can  simply  replace  the  array  ai, . . . ,  a„.  (Actually,  merge  sorting  can  be  done  "in 
place"  if  more  time  is  spent  on  merging.  The  Batcher  sort  is  an  in  place  merge  sort.)  You  might  like 
to  compare  this  with  Exercise  8.2.10  (p.  238),  where  we  obtained  an  estimate  of  1.78nlnn  for  5„.  Q 
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Exercises 


10.3.1.  Let  D{x)  be  the  "exponential"  generating  function  for  the  number  of  derangements  as  in  Exam- 
ple 10.7.  You'll  use  (10.20)  to  derive  a  linear  differential  equation  with  polynomial  coefficients  for 

D{x).  Then  you'll  equate  coefficients  to  get  a  recursion  for  Dn- 

(a)  Differentiate  (1  —  x)D{x)  =        and  the  use        =  (1  —  x)D(x)  to  eliminate  . 

(b)  Equate  coefficients  to  obtain  -Dn+i  =  '^{Dn  +  Dn-i)  for  n  >  0.  What  are  the  initial  condi- 
tions? 

10.3.2.  A  "path"  of  length  n  is  a  sequence  0  =  uq.ui,  . . .  ,Un  =  0  of  normcgativc  integers  such  that 

—  -u/c  G  {—1,  0, 1}  for  k  <  n.  Let  an  be  the  number  of  such  paths  of  length  n  The  OGF  for  an 
can  be  shown  to  be  A{x)  =  (1  —  2x  —  3x^)~^/^. 

(a)  Show  that  (1  -  2x  -  3a;^)yl'(a;)  =  {l  +  3x)A{x). 

(b)  Obtain  the  recursion 

{n+l)an+i  =  (2n  +  l)on  +  3no„_i    for  n  >  0. 

What  are  the  initial  conditions? 

(c)  Use  the  general  binomial  theorem  to  expand  (1  —  (2a;-|-3a;^))~^/^  and  then  the  binomial  theorem 
to  expand  (2x  +  3x^)^.  Finally  look  at  the  coefficient  of  to  obtain  On  as  a  sum  involving 
binomial  coefficients. 

10.3.3.  Fill  in  the  steps  in  the  derivation  of  the  average  time  formula  for  Quicksort: 

(a)  Solve  (10.27)  to  obtain  (10.28)  by  using  an  integrating  factor  or  any  other  method  you 
wish. 

(b)  Obtain  (10.29)  from  (10.28). 

10.3.4.  In  Exercise  10.2.6,  you  derived  the  formula 

„,  ^       1  —  X  —  Vl  —  2x  —  3x'^ 
=   2^  • 

Use  the  methods  of  this  section  to  derive  a  recursion  for  tn  that  is  simpler  than  the  summation  in 
Exercise  10.2.6(a). 

Hint.  Since  the  manipulations  involve  a  fair  bit  of  algebra,  it's  a  good  idea  to  check  your  recursion 
for  tn  by  comparing  it  with  actual  value  for  small  n.  They  can  be  determined  by  constructing  the 
trees. 
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Before  the  1960's,  combinatorial  constructions  and  generating  function  equations  were,  at  best, 
poorly  integrated.  A  common  route  to  a  generating  function  was: 

1.  Obtain  a  combinatorial  description  of  how  to  construct  the  structures  of  interest;  e.g.,  the 
recursive  description  of  unlabeled  full  binary  RP-trees. 

2.  Translate  the  combinatorial  description  into  equations  relating  elements  of  the  sequence  that 
enumerate  the  objects;  e.g.,  6„  =  J2k=i  ^kbn-k,  for  n  >  1  and  bi  =  1. 

3.  Introduce  a  generating  function  for  the  sequence  and  substitute  the  equations  into  the  generating 
function.  Apply  algebraic  manipulation. 

4.  The  result  is  a  relation  for  the  generating  function. 

Prom  the  1960's  on,  various  people  have  developed  methods  for  going  directly  from  a  combinatorial 
construction  to  a  generating  function  expression,  eliminating  Steps  2  and  3.  These  methods  often 
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allow  us  to  proceed  from  Step  1  directly  to  Step  4.  The  Rules  of  Sum  and  Product  for  generating 
functions  arc  basic  tools  in  this  approach.  Wc  study  them  in  this  section. 

So  far  we  have  been  thinking  of  generating  functions  as  being  associated  with  a  sequence  of 
numbers  ao,  ai, . . .  which  usually  happen  to  be  counting  something.  It  is  often  helpful  to  think  more 
directly  about  what  is  being  counted.  For  example,  let  B  be  the  set  of  unlabeled  full  binary  RP-trees. 
For  B  G  B,  let  'w{B)  be  the  number  of  leaves  of  B.  Then  6„  is  simply  the  number  oi  B  €  B  with 
w{B)  =  n  and  so 


We  say  that  B{x)  counts  unlabeled  full  binary  RP-trees  by  number  of  leaves.  It  is  sometimes  conve- 
nient to  refer  to  the  generating  hmction  by  the  set  that  is  associated  with  it.  In  this  case,  the  set  is 
B  so  we  use  the  notation  Gq{x)  or  simply  Gg.  Thus,  instead  of  asking  for  the  generating  function 
for  the  6„'s,  we  can  just  as  well  ask  for  the  generating  function  for  unlabeled  full  binary  RP-trees 
(by  number  of  leaves).  Similarly,  instead  of  asking  for  the  generating  function  for  Fn,  we  can  ask  for 
the  generating  function  for  sequences  of  zeroes  and  ones  with  no  adjacent  ones  (by  the  length  of  the 
sequence).  When  it  is  clear,  we  may  omit  the  phrase  "by  number  of  leaves,"  or  whatever  it  is  we  are 
counting  things  by.  We  could  also  keep  track  of  more  than  one  thing  simultaneously,  like  the  length 
of  a  sequence  and  the  number  of  ones.  We  won't  pursue  that  now. 

As  noted  above,  if  T  is  some  set  of  structures  (e.g.,  T  =  B),  we  let  Gq-  be  the  generating  function 
for  T,  with  respect  to  whatever  wc  are  counting  the  structures  in  T  by  (e.g.,  leaves  in  (10.31)). 

The  Rule  of  Sum  for  generating  functions  is  nothing  more  than  a  restatement  of  the  Rule  of 
Sum  for  counting  that  we  developed  in  Chapter  1.  The  Rule  of  Product  is  a  bit  more  complex. 
At  this  point,  you  may  find  it  helpful  to  look  back  at  the  Rules  of  Sum  and  Product  for  counting: 
Theorem  1.2  (p.  6)  and  Theorem  1.3  (p.  8). 

Theorem  10.3  Rule  of  Sum  Suppose  a  set  T  of  structures  can  be  partitioned  into  sets 
Ti, . . . ,  Tj  so  that  each  structure  in  T  appears  in  exactly  one  Ti.  It  then  follows  that 


The  Rule  of  Sum  remains  valid  when  the  number  of  blocks  in  the  partition  Ti ,  T2, . . .  is  infinite. 

Theorem  10.4  Rule  of  Product  Let  w  be  a  function  that  counts  something  in  structures. 
Suppose  each  T  in  a  set  T  of  structures  is  constructed  from  a  sequence  Ti, . . . ,  ofk  structures 
such  that 

(i)  the  possible  structures  Tj  for  the  ith  choice  may  depend  on  previous  choices,  but  the  gen- 
erating function  for  them  does  not  depend  on  previous  choices, 

(ii)  each  structure  arises  in  exactly  one  way  in  this  process  and 
(Hi)  if  the  structure  T  comes  from  the  sequence  Ti, . . . ,  Tk,  then 


10.31 


BeB 


n 


Gj-{x)  =  G^^{x)  +  ---  +  Gj-.{x). 


w{T)  =  w{Ti)  +  ...  +  w{Tk). 


It  then  follows  that 


10.32 


tgT 


wJiere  Gj  is  the  generating  function  for  the  possible  choices  for  the  ith  structure. 
The  Rule  of  Product  remains  valid  when  the  number  of  steps  is  infinite. 
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As  with  the  Rule  of  Product  for  counting,  the  available  choices  for  the  ith  step  may  depend  on  the 
previous  choices,  but  the  generating  function  must  not.  If  the  choices  at  the  ith  step  do  not  depend 
on  the  previous  choices,  we  can  think  of  T  as  simply  a  Cartesian  product  Ti  x  •  •  •  x  T^. 

The  additivity  condition  (iii)  is  needed  to  insure  that  multiplication  works  correctly,  namely 

Weights  that  count  things  (e.g.,  leaves  in  trees,  cycles  in  a  permutation,  size  of  set  being  partitioned) 
usually  satisfy  (iii).  This  is  not  always  the  case;  for  example,  counting  the  number  of  distinct  things 
(e.g.,  cycle  lengths  in  a  permutation)  is  usually  not  additive.  Weights  dealing  with  a  maximum  (e.g., 
longest  path  from  root  to  leaf  in  a  tree,  longest  cycle  in  a  permutation)  do  not  satisfy  (iii). 

Proof:  We  will  prove  (10.32)  by  induction  on  k,  starting  with  k  —  2.  The  induction  step  is 
practically  trivial  simply  group  the  first  k  —  1  choices  together  as  one  choice,  apply  the  theorem 
for  A;  =  2  to  this  grouped  choice  and  the  fcth  choice,  and  then  apply  the  theorem  for  A;  —  1  to  the 
grouped  choice. 

The  proof  for  k  —  2  can  be  obtained  by  applying  of  the  Rules  of  Sum  and  Product  for  counting 
as  follows.  Let  tij  be  the  number  of  ways  to  choose  the  ith  structure  so  that  it  contains  exactly  j  of 
the  objects  we  are  counting;  that  is,  the  number  of  ways  to  choose  Tj  so  that  w{Ti)  —  j.  The  number 
of  ways  to  choose  Ti  so  that  it  contains  j  objects  AND  then  choose  T2  so  that  together  Ti  and  T2 
contain  n  objects  is  tijt2,n-j-  Thus,  the  total  number  of  structures  in  T  that  contain  exactly  n 
objects  is 

n 

^ti,j  t2,n-j- 

i=o 

Multiplying  by  x",  summing  over  n  and  recognizing  that  we  have  a  convolution,  we  obtain  (10.32) 
for  fc  =  2. 

Compare  the  proof  we  have  just  given  for  fc  =  2  with  the  following.  By  hypotheses  (ii)  and  (iii) 
of  the  theorem. 


By  hypothesis  (i),  the  inner  sum  equals  G2  even  though  T2  may  depend  on  Ti.  Thus  the  above 
expression  becomes  G1G2.  While  this  might  seem  almost  magical,  it's  a  perfectly  valid  proof.  The 
lesson  here  is  that  it's  often  easier  to  sum  over  structures  than  to  sum  over  indices. 

Passing  to  the  infinite  case  in  the  theorems  is  essentially  a  matter  of  taking  a  limit.  We  omit 
the  proof.  Q 

Example  10.13  Binomial  coefficients  Let's  apply  these  theorems  to  enumerating  binomial 
coefficients.  Our  structures  will  be  subsets  of  n  and  we  will  be  keeping  track  of  the  number  of 
elements  in  a  subset;  i.e.,  w{S)  =  \S\,  the  number  of  elements  in  S.  We  form  all  subsets  exactly  once 
by  a  sequence  of  n  choices.  The  ith  choice  will  be  either  0  (the  empty  set)  or  the  set  {i}.  The  union 
of  our  choices  will  be  a  subset.  The  Rule  of  Product  can  be  applied.  Since  w{(l>)  =  0  and  w{{i})  =  1, 
Gi{x)  =  1  +  X  by  the  Rule  of  Sum.  Thus  the  generating  function  for  subsets  of  n  by  cardinality  is 
(1  +  a;)  ■  •  •  (1  +  x)  =  (1  +  .x)".  Compare  this  with  the  derivation  in  Example  1.14  (p.  19).  Because 
this  problem  is  so  simple  and  because  you  are  not  familiar  with  using  our  two  theorems,  you  may 
find  the  derivation  in  Example  1.14  easier  than  the  one  here.  Read  on.  Q 
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Example  10.14  Counting  unlabeled  RP-trees  Let's  look  at  unlabeled  RP-trees  from  this 
new  vantage  point.  If  a  tree  has  more  that  one  vertex,  let  si, . . .  ,Sk  be  the  sons  of  the  root  from  left 
to  right.  We  can  describe  such  a  tree  by  listing  the  k  subtrees  Ti, . . . ,  Tfc  whose  roots  are  si, . . . ,  Sfc. 

This  gives  us  a  fc-tuple.  Note  that  T  has  as  many  leaves  as  Ti, . . . ,  together.  In  fact,  if  you  look 
back  to  the  start  of  Chapter  9,  you  will  see  that  this  is  nothing  more  nor  less  than  the  definition  we 
gave  there. 

Let  B{x)  be  the  generating  function  for  unlabeled  full  binary  unlabeled  RP-trees  by  number 
of  leaves.  By  the  previous  paragraph,  an  unlabeled  full  binary  RP-tree  is  either  one  vertex  OR  a 
2-tuple  of  unlabeled  full  binary  RP-trees  (joined  to  a  new  root).  Applying  the  Rules  of  Sum  and 
Product  with  j  =  k  =  2,  we  have 

Gg{x)  =  x  +  G^{x)Gg{x), 

which  can  also  be  written 

B{x)  =  x  +  B{x)B{x). 

This  is  much  easier  than  deriving  the  recursion  first — compare  this  derivation  with  the  one  in  Ex- 
ample 10.5  (p.  279). 

Now  let's  count  arbitrary  imlabelcd  RP-trees.  In  this  case,  wc  cannot  count  them  by  leaves 
because  there  are  an  infinite  number  of  trees  with  just  one  leaf:  any  path  is  such  a  tree.  We'll  count 
them  by  vertices.  Let  T{x)  be  the  generating  function.  Proceeding  as  in  the  previous  paragraph,  we 
say  that  such  a  tree  is  either  a  single  vertex,  OR  one  tree,  OR  a  2-tuple  of  trees,  OR  a  3-tuple  of 
trees,  and  so  on.  Thus  we  (incorrectly)  write  T{x)  =  x  +  T{x)  +T'^{x)  +  ■  ■  ■.  Why  is  this  wrong?  We 
did  not  apply  the  Rule  of  Product  correctly.  The  number  of  vertices  in  a  tree  T  is  not  equal  to  the 
total  number  of  vertices  in  the  fc-tuple  (Ti, . . . ,  Tf.)  that  comes  from  the  sons  of  the  root:  We  forgot 
that  there  is  one  more  vertex,  the  root  of  T . 

Let's  do  this  correctly.  Instead  of  a  fc-tuple  of  trees,  we  have  a  vertex  AND  a  fc-tuple  of  trees. 
Thus  a  tree  is  either  a  single  vertex,  OR  a  single  vertex  AND  a  tree,  OR  a  single  vertex  AND  a 
2-tuple  of  trees,  and  so  on.  Now  we  get  (correctly) 

T{x)  =  x  +  xT{x)  +  xT^{x)  +  ---  =  Y^Tix)^ 

by  the  Rules  of  Sum  and  Product  and  the  formula  for  a  sum  of  a  geometric  scries.  Multiplying  by 
1  —  T(x),  we  have  T(x)  —  T'^{x)  =  x,  which  is  the  same  as  the  equation  for  B{x).  Thus 

Theorem  10.5  The  number  of  n  vertex  unlabeled  RP-trees  equals  the  number  of  n  leaf 
unlabeled  full  binary  RP-trees. 

This  was  proved  in  Example  7.9  (p.  206)  by  showing  that  the  numbers  satisfied  the  same  recursion 
and  in  Exercise  9.3.12  (p.  266)  by  giving  a  bijection. 

You  should  be  able  to  derive  T{x)  =  x  +  T(x)^  directly  from  the  second  definition  of  RP-trees 
in  Example  7.9  (p.  206)  and  hence  prove  the  theorem  this  way. 

We've  looked  at  two  extremes:  full  binary  trees  (all  nonleaf  vertices  have  exactly  2  children)  and 
arbitrary  trees  (nonleaf  vertices  can  have  any  number  of  children).  We  can  study  trees  in  between 
these  two  extremes.  Let  _D  be  a  set  of  positive  integers.  Let  2?  be  those  unlabeled  RP-trees  where 
the  number  of  children  of  each  vertex  lies  in  D.  The  two  extremes  correspond  to  D  =  {2}  and 
D  =  {1, 2,3,.. .}.  If  we  count  these  trees  by  number  of  vertices,  you  should  be  able  to  show  that 

G-pix)  =  x+Y,  xGj){xY. 

In  general,  we  cannot  solve  this  equation;  however,  we  can  simplify  the  sum  if  the  elements  of  V  lie 
in  an  arithmetic  progression.  Our  two  extremes  are  examples  of  this.  For  another  example,  suppose 
V  is  the  set  of  positive  odd  integers.  Then  the  sum  is  a  geometric  series  with  first  term  xG-p(x)  and 
ratio  G-p(a;)^.  After  some  algebra,  one  obtains  a  cubic  equation  for  G-p(a;).  We  won't  pursue  this.  D 
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Example  10.15  Balls  in  boxes  Problems  that  involve  placing  unlabeled  balls  into  labeled 
boxes  (or,  equivalently,  problems  that  involve  compositions  of  integers),  are  often  easy  to  do  using 
the  Rules  of  Sum  and  Product.  Let  be  the  set  of  possible  ways  to  put  things  into  the  ith  box.  Let 
be  the  generating  fimction  which  is  keeping  track  of  the  things  in  the  ith  box.  Suppose  that 
what  can  be  placed  into  one  box  is  not  dependent  on  what  is  placed  in  other  boxes.  The  Rule  of 
Product  (in  the  Cartesian  product  form),  tells  us  that  we  can  simply  multiply  the  Gq-.'s  together. 

How  many  ways  can  wc  put  unlabeled  balls  into  k  labeled  boxes  so  that  no  box  is  empty?  Since 
there  is  exactly  one  way  to  place  j  balls  in  a  box  for  every  j  >  0  and  no  ways  if  j  =  0  (since  the  box 
may  not  be  empty),  we  have 


for  all  i.  By  the  Rule  of  Product,  the  generating  function  is 


1  —  X       1  —  X 


^^(l-x)-^ 


Since 


,k+i 


it  follows  that  the  number  of  ways  to  distribute  n  unlabeled  balls  is  (^Z^)  =  (feZi))  which  you  found 
in  Exercise  1.5.4  (p.  38). 

How  many  solutions  arc  there  to  the  equation  zi  +  Z2  +  ^3  =  n  where  zi  is  an  odd  positive 
integer  and  Z2  and  Z3  are  nonnegative  integers  not  exceeding  10?  We  can  think  of  this  as  placing 
balls  into  boxes  where  Zi  balls  go  into  the  ith  box.  Since 


and 


Gq-^{x)  =  x  +  x^+x^  +  ---  =  x{l  +  x^  +  {x^f  +  (x^)^  +  ■  ■  ■) 


1  -  a;" 

Gq-^{x)  =  Gq-^{x)  =  l+x  +  ---  +  x'°  =  -YZr^'. 


it  follows  that  the  generating  function  is 


1 


1 


There  isn't  a  nice  formula  for  the  coefficient  of  x". 

What  if  we  allow  positive  integer  coefficients  in  our  equation?  For  example,  how  many  solutions 
are  there  to  Zi  +  2zi  +  3z3  =  n  in  nonnegative  integers?  In  this  case,  put  z\  balls  in  the  first  box, 
2z2  balls  in  the  second  and  82:3  balls  in  the  third.  Since  the  number  of  balls  in  box  i  is  a  multiple  of 
i,  Gq;,{x)  =  1/(1  -  x^).  By  the  Rule  of  Product  Gjix)  =  1/((1  -  a;)(l  -  x^){\  -  x^)).  This  result 
can  be  thought  of  as  counting  partitions  of  the  number  n  where  Zi  is  the  number  of  parts  of  size  i. 
By  extending  this  idea,  it  follows  that,  if  p(n)  is  the  number  of  partitions  of  the  integer  n,  then 


1 


1 


n=0 


1-X  1 


1 


So  far  we  have  only  used  the  Rules  of  Sum  and  Product  for  single  variable  generating  functions. 
We  need  not  limit  ourselves  in  this  manner.  As  we  will  explain: 


Observation  The  Rules  of  Sum  and  Product  apply  to  generating  functions  with  any  number 
of  variables. 


296       Chapter  10    Ordinary  Generating  Functions 


Suppose  we  arc  keeping  track  of  m  different  kinds  of  things.  Replace  w  by  w,  an  m  long  vector 
of  integers.  Then  For  example,  if  we  count  words  by  the  number  of  vowels,  the 

number  of  consonants  and  the  length  of  the  word,  w  will  be  a  3  long  vector — one  component  for 
number  of  vowels,  one  for  number  of  consonants  and  one  for  total  number  of  letters.  In  that  case, 
the  variables  will  also  form  a  3  long  vector  x.  We  can  replace  (10.31)  with 

J2  x-(^)  =  B(x), 
BeB 

where,  as  we  already  said,  x"^  means  x^^  ■  ■  ■       .  The  condition  on  w  in  the  Rule  of  Product  becomes 

w(T)  =  w(Ti)  +  ...  +  w(Tfc). 

Of  course,  we  could  choose  other  indices  besides  1, . . . ,  m  for  our  vectors  and  even  replace  some  of 
the  Xj's  with  other  letters.  In  the  next  example,  we  find  it  convenient  to  use  x  =  {xo,xi). 

Example  10.16    Strings  of  zeroes  and  ones        Let's  look  at  strings  of  zeroes  and  ones.  It  will 

be  useful  to  have  a  shorthand  notation  for  writing  down  strings.  The  empty  string  will  be  denoted 
by  A.  If  s  is  a  string,  then  (s)''  stands  for  the  string  ss . .  .s  that  consists  of  k  copies  of  s  and  (s)* 
stands  for  the  set  of  strings  that  consist  of  any  number  of  copies  of  s,  i.e.,  (s)*  =  {A,  s,  (s)^,  (s)^, . . .}. 
When  s  is  simply  0  or  1,  we  usually  omit  the  parentheses.  Thus  we  write  0*  and  1*^  instead  of  (0)* 
and  (1)*=. 

The  sequences  counted  by  the  Fibonacci  numbers,  namely  those  which  contain  no  adjacent  ones, 
can  be  described  by 

JF  =  0*  U  (0*  1  Z*  0*)  where  Z  =  0*  01. 

This  means 

(a)  any  number  of  zeroes  OR 

(b)  any  number  of  zeroes  AND  a  one  AND  any  number  of  sequences  of  the  form  Z  to  be  described 
shortly  AND  any  number  of  zeroes. 

A  sequence  of  the  form  Z  is  any  number  of  zeroes  AND  a  zero  AND  a  one.  You  should  convince 
yourself  that  does  indeed  give  exactly  those  sequences  which  contain  no  adjacent  ones.  As  you 
can  guess  from  the  ANDs  and  ORs  above,  this  is  just  the  right  sort  of  situation  for  the  Rules  of  Sum 
and  Product. 

What  good  does  such  a  representation  do  us? 


Observation  If  this  representation  gives  every  pattern  in  exactly  one  way,  we  can  mechanically 
use  the  Rules  of  Sum  and  Product  to  obtain  a  generating  function. 
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For  a  union  (i.e.,  U  or  {•  •  •}),  we  are  dealing  with  OR,  so  the  Rule  of  Sum  applies.  When  symbols 
appear  to  be  multiplied  it  means  first  one  thing  AND  then  another,  so  we  can  apply  the  Rule  of 
Product.  For  a  set  S,  the  notation  S*  means  any  number  of  copies  of  things  in  S.  For  example, 
{0, 1}*  is  the  set  of  all  strings  of  zeroes  and  ones,  including  the  empty  string.  If  there  is  a  unique 
way  to  decompose  elements  of  S*  into  elements  in  S,  then 

C30 


fe=0 


What's  "unique"  decomposition  mean?  When  S  =  {0, 1},  every  string  of  zeroes  and  ones  has  a  unique 
decomposition — just  look  at  each  element  of  the  string  one  at  a  time.  When  S  =  {0, 01, 11}  wc  still 
have  unique  decomposition;  for  example,  110001111101  decomposes  uniquely  as  11-0-01-11-11-01. 

We  leave  it  to  you  to  verify  that  our  representation  for  gives  all  the  patterns  exactly  once. 
Let  xo  keep  track  of  zeroes  and  xi  keep  track  of  ones;  that  is,  the  coefficient  of  XqX^^  in  G_^(a;o,a;i) 
will  be  the  number  sequences  in     that  have  n  zeroes  and  m  ones.  We  have 

Go.  =  =  =  (1  -  ^o)"' 

1  -  Go     1  -  Xo 


Gz'  —  G(o-«  01)* 


1  1  1  -  a;o 


1  -  Go*  01        1  -  (1  -  Xq)    ^XqXi  1  -  Xo  -  XoXi 


-  1,1  l-xo  1  l  +  Xi 

GjF  =   -^   +  -^     =  . 

1  —  Xo       I  —  Xq         1  —  Xo  —  XqXi    I  —  Xq  1  —  Xq  —  XoXi 

We  can  use  this  representation  to  describe  and  count  other  sequences;  however,  the  problem 
can  get  tricky  if  we  are  counting  sequences  that  must  avoid  patterns  more  complicated  than  11. 
There  are  various  ways  to  handle  the  problem.  One  method  is  by  the  use  of  sets  of  recursions,  which 
we'll  discuss  in  the  next  chapter.  Sequences  that  can  be  described  in  this  fashion  (we  haven't  said 
precisely  what  that  means)  are  called  regular  sequences.  They  are,  in  fact,  the  strings  that  can  be 
produced  by  regular  grammars,  which  we  saw  in  Section  9.2  were  the  strings  that  can  be  recognized 
by  finite  automata.  See  Exercise  10.4.19  (p.  304)  for  a  definition  and  the  connection  with  automata. 
There  is  a  method  for  translating  finite  automata  into  recursions.  We'll  explore  this  in  Example  11.2 
(p.  310).  □ 

We  close  this  section  with  an  example  which  combines  the  Rules  of  Sum  and  Product  with  some 
techniques  for  manipulating  generating  functions. 

*Example  10.17  Counting  certain  spanning  trees  Let  G  be  a  simple  graph  with  1/ =  nU{0} 
and  the  2n  —  1  edges  +  1}  (1  <  i  <  n)  and  {0,j}  (1  <  j  <  n).  (Draw  a  picture!)  How  many 
spanning  trees  does  G  have?  We'll  call  the  number  r„. 

To  begin  with,  what  does  a  spanning  tree  look  like?  An  arbitrary  spanning  tree  can  be  built  as 
follows.  First,  select  some  of  the  edges  {i,i  +  l}(l<z<n).  This  gives  a  graph  H  with  vertex  set 
n.  (Some  vertices  may  not  be  on  any  edges.)  For  each  component  C  of  H,  select  a  vertex  j  in  C  and 
add  the  edge  {0,  j}  to  our  collection.  Convince  yourself  that  this  procedure  gives  all  the  trees. 

We  can  imagine  this  in  a  different  way.  Let  T  be  the  set  of  rooted  trees  of  the  following  form. 
For  each  fc  >  0,  let  F  =  fc  U  {0}  and  let  0  be  the  root.  The  tree  contains  the  fc  —  1  edges  {i,  i  +  l} 
{1  <i  <  k)  and  one  edge  of  the  form  {0,  j}  for  some  1  <  j  <  k.  Join  together  an  ordered  list  of  trees 
in  T  by  merging  their  roots  into  one  vertex  and  relabeling  their  nonroot  vertices  1,2,...  in  order  as 
shown  in  Figure  10.1.  This  process  produces  each  spanning  tree  exactly  once. 

What  we  have  just  described  is  the  perfect  setup  for  the  Rules  of  Sum  (on  k)  and  Product  (of 
T  with  itself  k  times)  when  we  count  the  number  of  vertices  other  than  the  vertex  0.  Thus,  recalling 
the  definition  of  r„  at  the  start  of  the  example, 

fe=l  fe=l 
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112312  112312  123456 


0      0      0  0  0 


Figure  10.1  Building  a  spanning  tree  from  pieces.  The  pieces  on  the  left  are  assembled  to  give  the  middle 
figure  and  are  then  relabeled  to  give  the  right-hand  figure. 


How  many  trees  in  T  have  k  nonroot  vertices?  Exactly  k,  one  for  each  choice  of  a  vertex  to  connect 
to  the  root.  Thus  T{x)  =  J2T=i  ^®  evaluate  this  sum  by  using  derivatives  as  discussed  in 
the  previous  section: 


oc  oo  oo         ,/  ^\ 

T(x)  =  j:kx^  ^  Y.kx^  ^ 

fc=l  fc=0  fc=0 


X*^  I    =  X- 


Combining  these  results  gives  us 


dx  7  dx  (1  —  x) 


Rix)  =  '''f      =  ^.  10.33 


(1  -  xf 

What  can  wc  do  now  to  get  values  for  r„?  We  have  two  choices:  (a)  expand  by  partial  fractions 
to  get  an  exact  value  or  (b)  manipulate  (10.33)  using  the  ideas  of  the  previous  section  to  obtain  a 
recursion.  By  partial  fractions,  r„  is  the  integer  closest  to  a"/-\/5,  where  a  =  (3-|-\/5)/2,  which  gives 
us  a  quick,  accurate  approximation  to  r„  for  large  n.  We  leave  the  calculations  to  you  and  turn  our 
attention  to  deriving  a  recursion. 

Clearing  of  fractions  in  (10.33)  and  equating  the  coefficients  of  x"  on  both  sides  of  the  resulting 
equation  gives  the  recursion 

ro  =  0,    ri  =  1    and    r„  =  3r„_i  —  r„_2  for  n  >  2,  10.34 

which  makes  it  fairly  easy  to  build  a  tabic  of  r,i. 

Can  you  prove  (10.34)  directly;  i.e.,  without  using  generating  functions?  It's  a  bit  tricky.  With 
some  thought  and  experimentation,  you  may  be  able  to  discover  the  argument.  Q 


Exercises 


10.4.1.    Let  T  be  a  collection  of  structures.  Suppose  that  w(r)  ^  0  for  all  T  e  T.  Prove  the  following 

results. 

(a)  The  generating  function  for  fc-lists  of  structures,  with  repetitions  allowed,  is  (G-j-)*^. 

(b)  The  generating  function  for  lists  of  structures,  with  repetitions  allowed,  is  (1  —  G-^-)"^.  Here  lists 
of  any  length  are  allowed,  including  the  empty  list. 

(c)  If  T  is  a  generating  function,  let  F^'^  denote  the  result  of  replacing  all  the  variables  by  their 
fcth  powers.  For  example,  F^^^{x,y)  =  F{x^ ^y^^.  Show  that  the  generating  function  for  sets  of 
structures,  where  each  structure  must  come  from  T  is 

(oo  s 
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Hint.  Show  that  the  answer  is 

replace  (1  +  x^^''"^),  with  exp(ln(l  +  x^'^^)) ,  expand  the  logarithm  by  Taylor's  Theorem  and 
rearrange  the  terms. 

(d)  Show  that  generating  function  for  multisets  of  structures  is 


10.4.2.  Return  to  Exercise  10.2.4  (p.  280).  There  wo  counted  the  number  of  sequences  of  zeros,  ones  and  twos 
with  no  adjacent  ones  and  no  adjacent  twos.  Show  that  part  (a)  of  that  exercise  can  be  rewritten  as 
follows. 

A  sequence  of  the  type  we  want  is  either 

•  an  alternating  sequence  of  ones  and  twos  OR 

•  a  sequence  of  the  type  we  want  AND  a  zero  AND  an  alternating  sequence  of  ones  and  twos. 

Here  the  alternating  sequence  may  be  empty.  Use  this  characterization  to  deduce  an  equation  that 
can  be  solved  for  S{x) 

10.4.3.  Using  the  notation  introduced  in  Example  10.16,  write  out  expressions  for  strings  satisfying  the 
following  properties.  Do  this  in  such  a  way  that  each  string  is  generated  uniquely  and  then  use  your 
representation  to  get  the  generating  function  for  the  number  of  patterns  of  length  n.  Finally,  obtain 
a  recursion  from  your  generating  function.  Remember  to  include  initial  conditions  for  the  recursion. 

(a)  Strings  of  zeroes,  ones  and  twos  that  have  do  not  have  the  pattern  02  somewhere. 

Hint.  Except  possibly  for  a  run  of  twos  at  the  very  start  of  the  string,  every  2  must  be  preceded 
by  a  1  or  a  2. 

(b)  Strings  of  zeroes  and  ones  such  that  each  string  of  ones  is  followed  by  a  string  of  at  least  k  zeroes; 
i.e.,  if  it  starts  with  a  string  of  zeroes,  that  can  be  of  any  length,  but  every  other  string  of  zeroes 
must  have  length  at  least  k.  Use  the  notation  O*'  to  stand  for  a  string  of  k  zeroes. 

(c)  Strings  of  zeroes  and  ones  such  that  each  maximal  string  of  ones  (i.e.,  its  ends  are  the  ends  of 
the  sequence  and/or  zeroes)  has  odd  length. 

10.4.4.  Let  qn  be  the  number  of  partitions  of  n  with  no  repeated  parts  allowed  and,  as  usual,  let  pn  be  all 
partitions  of  n.  Let  go  =  1- 

(a)  Show  that 

oo  oo 

Q(x)  =  ^9„x"  =  JJ(l  +  :c^). 

n=0  i=l 

(b)  Let  P{x)  be  the  generating  function  for  partitions  of  a  number.  Show  that  Q{—x)P{x)  =  1. 
Equate  coefficients  of  x"  for  n  >  0  and  then  rearrange  to  avoid  subtractions.  Interpret  the 
rearranged  result  combinatorially.  Can  you  give  a  direct  proof  of  it? 

(c)  Let  Qn^k  (resp.  Pn,k)  be  the  partitions  counted  in  qn  (resp.  pn)  in  which  no  part  exceeds  k. 
Obtain  formulas  for  J2n>o  1n,kx'^  and  J2n>oPn,kx"'- 
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10.4.5.  Let  a  "pile"  bo,  roughly,  a  two  dimensional  stack  of  square  blocks  resting  on  a  flat  surface,  with  each 
block  directly  on  top  of  another  and  each  row  not  having  gaps.  A  more  formal  definition  of  a  pile  of 
height  7i  is  a  sequence  of  2h  integers  such  that 

0  =  oi  <  02  <  •  •  •  <  o/i  <  6/,  <  •  •  •  <  62  <  &1  • 

Here  a  block  has  width  1  and,  counting  rows  with  the  first  row  on  the  bottom,  the  left  end  of  row  i  is 
at  ttj  and  the  right  end  is  at  bi .  The  number  of  blocks  in  the  ith  row  is  6j  —  a,  and  the  total  number 
of  blocks  in  the  pile  is  —  di)-  Let  Sn  be  the  number  of  n-block  piles  and  s^^h  the  number  of 

those  of  height  h.  Obtain  a  formula  for  ^  Snx"  and  X^„>q 

Hint.  The  generating  function  for  partitions  with  no  part  exceeding  k  will  be  useful. 

10.4.6.  Let  ai  <  02  <  •  ■  •  <  be  a  A;  element  subset  of  n  =  {1,2,...,  n}.  We  will  study  subsets  with 
restrictions  on  the  a^. 

(a)  Let  00  =  0.  By  looking  at  ttj  —  aj_i,  show  that  there  is  a  bijection  between  k  element  subsets  of 
n  and  k  long  sequences  of  positive  integers  with  sum  not  exceeding  n. 

(b)  Let  Un  be  the  number  of  k  element  subsets  of  n.  Use  (a)  to  show  that 

i>l  i>0  ^  ' 

(Do  not  use  the  fact  that  )  =  („"fc)-) 

(c)  Let  tn  be  the  number  of  k  element  subsets  04  <  02  <  ■  •  ■  of  n  such  that  i  and  Oj  have  the  same 
parity.  In  other  words  aij  is  even  and  a2j+l  is  odd.  Show  that 

T(.)-  ^     -    (^  +  ^)^' 


2\fe+l  ■ 


(l_^2)fe  (l-a;2) 

(d)  Let  \_x\  be  the  result  of  rounding  x  down;  e.g.,  [3.5J  =  3.  Show  that  tn  =  ^l-("+^)/2J^ 

(e)  We  call  (a,,  aj_|_i)  a  succession  if  they  differ  by  one.  Let  s^j  be  the  number  of  k  element  subsets 
of  n  with  exactly  j  successions.  Show  that 

b{x,y)  -  ^_^(xy  +  x  +x  +     )        -  _  ^-,fe+i 


(k+l-j) 


(f)  Show  that  E„>o«nJ^"  =  {''^^)x'"'-^-\l-xy 

(g)  Express  s„  j  as  a  product  of  two  binomial  coefficients.  Check  your  result  by  listing  all  4  element 
subsets  of  {1,  .  .  .  ,  6}  and  determining  how  many  successions  they  have. 

10.4.7.  Recall  that  a  binary  RP-tree  to  be  an  RP-tree  where  each  vertex  may  have  at  most  two  sons.  The 
set  T  of  such  trees  was  studied  in  Exercise  10.2.6,  where  we  counted  them  by  number  of  vertices. 

(a)  Using  the  Rules  of  Sum  and  Product,  derive  the  relation  T{x)  =  x  +  xT{x)  +  xT(x)^  that  led 
to 

.        1  —  X  —  Vl  —  2x  —  ?>x^ 
=   2x  

in  Exercise  10.2.6 

(b)  Discuss  how  you  might  compute  the  number  of  such  trees.  In  particular,  can  you  find  a  simple 
expression  as  a  function  of  n? 

10.4.8.  Change  the  definition  in  Exercise  10.4.7  so  that,  if  a  node  has  just  one  son,  then  we  distinguish 

whether  or  not  it  is  a  right  or  a  left  son.  (This  somewhat  strange  sounding  distinction  is  sometimes 
important.)  How  many  such  trees  are  there  with  n  internal  vertices? 
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10.4.9.  A  rooted  tree  will  be  called  "contractible"  if  it  has  a  vertex  with  just  one  son  since  one  can  imagine 
combining  that  vertex's  information  with  the  information  at  its  son. 

(a)  Find  the  generating  function  for  the  number  of  unlabeled  noncontractible  RP-trees,  counting 
them  by  number  of  vertices. 

(b)  Find  the  generating  function  for  the  number  of  unlabeled  noncontractible  RP-trees,  counting 

them  by  number  of  leaves. 

(c)  Obtain  a  linear  differential  equation  with  polynomial  coefficients  and  thence  a  recursion  from 
each  of  the  generating  functions  in  this  problem. 

Hint.  Solve  for  the  square  root,  differentiate,  multiply  by  the  square  of  the  square  root  and  then 
replace  the  square  root  that  remains. 

10.4.10.  Let  tn^k  be  the  number  of  RP-trees  with  n  leaves  and  k  internal  vertices  (i.e.,  nonleaves). 

(a)  Find  a  generating  function  for  T{x,y). 

(b)  Using  the  previous  result,  prove  that      k  =  tk  n  when  n  -|-    >  1. 
Hint.  Compare  T(x,y)  and  T(y,x). 

*(c)  Find  a  bijection  that  proves  t^^k  ~  ^k,n  when  n  +  k  >  1;  that  is,  find  a  map  from  RP-trees 
to  RP-trees  that  carries  leaves  to  internal  vertices  and  vice  versa  for  trees  with  more  than  one 
vertex.  Write  out  your  bijection  for  RP-trees  with  5  vertices. 
Hint.  Describe  the  construction  recursively  (or  locally). 

10.4.11.  Let  D  be  a  set  of  nonnegative  integers  such  that  Q  £  D.  For  this  exercise,  we'll  say  that  an  RP-tree 
is  of  outdegree  D  if  the  number  of  sons  of  each  vertex  lies  in  D.  Thus,  full  binary  RP-trees  are  of 
outdegree  {0,  2}. 

(a)  Let  Td{x)  be  the  generating  function  for  unlabeled  RP-trees  of  outdegree  D  by  number  of 
vertices.  Prove  that 

Td{x)  =  x^Toixf 

deD 


(b)  Show  that  the  previous  formula  allows  us  to  compute  Tjj  (x)  recursively. 

(c)  Let  L D  (x)  be  the  generating  function  for  unlabeled  RP-trees  of  outdegree  D  by  number  of  leaves. 
Show  that  it  doesn't  make  sense  to  talk  about  Ld{x)  when  1  £  D,  that 

Ld{x)  =  '^Loixf  -l+x, 

deD 

and  that  this  allows  us  to  compute  Ld{x)  recursively  when  1  ^  D. 

10.4.12.  We  have  boxes  labeled  with  pairs  of  numbers  like  (2,  6).  The  labels  have  the  form  {i,j)  for  1  <  i  <  3 
and  1  <  j  <  k.  Thus  we  have  3k  boxes.  Unlabeled  balls  are  placed  into  the  boxes.  This  must  be  done 
so  that  the  number  of  balls  in  box  {i,j)  is  a  multiple  of  i  and,  for  each  j,  the  total  number  of  balls 
in  boxes  (1,  j),  (2,  j)  and  is  at  most  5.  What  is  the  generating  function  for  the  number  of  ways 

to  place  n  balls? 

Hint.  Find  the  generating  function  for  placing  balls  into  (1,  *),  (2,  *)  and  (3,  *)  and  then  use  the  Rule 
of  Product. 

*  10.4. 13.  An  unlabeled  full  binary  rooted  tree  is  like  the  ordered  (i.e.,  plane)  situation  except  that  we  make 
no  distinction  between  left  and  right  sons.  Let  /?«  be  the  number  of  such  trees  with  n  leaves  and  let 
B{x)  =  J20rix"'-  Show  that  B{x)  =  x  +  (B{x  f  +  B{x'^))/2. 
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Figure  10.2    A  path  for  n  =  4  for  Exercise  10.4.14. 


10.4.14.  Imagine  the  plane  with  hnes  Uke  a  sheet  of  graph  paper;  i.e.,  the  hnes  x  =  k  and  the  hnes  y  =  k 
are  drawn  in  for  all  integers  k.  Think  of  the  intersections  of  the  lines  vertices  and  the  line  segments 
connecting  them  as  edges.  The  portion  of  the  plane  with  0  <  x  <  n  and  0  <  y  <  n  is  then  a  simple 
graph  with  (n  +  1)^  vortices.  Let  a„  be  the  number  of  paths  from  the  vertex  (0,0)  to  the  vertex 
(n,  n)  that  never  go  down  or  left  and  that  remain  above  the  line  x  =  y  except  at  (0,0)  and  (n,  n). 
FigurelO.2  shows  such  a  path  for  n  =  4.  We  could  describe  such  a  path  more  formally  as  a  sequence 
{xi,  yi)  of  pairs  of  nonnegative  integers  such  that  xq  =  yo  =  0,  X2n  =  Vln  =  n,  <  J/i  for  0  <  i  <  2n 
and  {xi,yi)  —  (xj-i,  j/i_i)  equals  either  (1,0)  or  (0, 1).  Draw  some  pictures  to  see  what  this  looks 
like. 

(a)  Show  that  a„  is  the  number  of  sequences  si, . . . ,  S2n  containing  exactly  n  —  I's  and  n  I's  such 
that  si  H  h  Sfe  >  0  for  0  <  fe  <  2n. 

(b)  By  looking  at  S2,  •  •  • ,  S2n-1  for  n  >  1,  conclude  that  A{x)  =  x  +  ^j.^^  x  A{x)'^ . 

(c)  Determine  Sn-  Note  that  this  number  is  the  same  as  the  number  of  unlabeled  full  binary  RP-trees 
with  n  leaves,  which  is  the  same  as  the  number  of  unlabeled  RP-trees  with  n  vertices. 

*(d)  In  the  previous  part,  you  concluded  that  the  set  Sn  of  paths  of  a  certain  type  from  (0,0)  to 
(n,  n)  has  the  same  size  as  the  set  T„  of  unlabeled  RP-trees  with  n  vertices.  Find  a  bijection 
fn'-Sn  — >  Tn,  and  thus  prove  this  equality  without  the  use  of  generating  functions. 

10.4.15.  Fix  a  set  S  of  size  s.  For  n  >  1,  let  an^k  be  the  number  of  n  long  ordered  lists  that  can  be  made 
using  S  so  that  we  never  have  more  than  k  consecutive  identical  entries  in  the  list.  Thus,  with  k  >  n 
there  is  no  restraint  while  with  k  =  1  adjacent  entries  must  be  different.  Let  A^ix)  =  X^„>q  an,k^"^- 
There  are  various  approaches  to  Ak{x). 

(a)  By  considering  the  last  run  of  identical  entries  in  the  list  and  using  the  Rules  of  Sum  and  Product, 

show  that 

Ak{x)  =  s{x  +  x'^  +  ---+x'')  +  Ak{x){s-l){x  +  x'^  +  ---+x''). 


(b)  Find  an  explicit  formula  for  A^ix). 

(c)  Show  that  Un+i^k  =  S0"n,k  ~     ~  l)'*n-fc,fe  for  n  >    by  using  the  generating  function. 

(d)  Derive  the  previous  recursion  by  a  direct  argument. 
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10.4.16.  We  claim  that  the  set  of  sequences  of  zeroes  and  ones  that  do  not  contain  either  101  or  111  is 
described  by 


0*U  (^0*(1U  11)(000*(1  U11))*0*^ 


and  each  such  sequence  has  a  unique  description.  You  need  not  verify  this.  Let  an  be  the  number  of 
such  sequences  of  length  n  and  let  ^(a:)  be  the  generating  function  for  an- 


(a)  Derive  the  formula  A{x)  =  i+^+2:r 


(b)  Using  ^(a;),  obtain  the  recursion  an  =  On-i  +  O71-3  +  On-4  for  n  >  4  and  find  the  initial 
conditions. 

(c)  Using  1  —  X  —      —  x"^  =  {I  —  X  —  x^){l  +  x'^),  derive  the  formula 
7Fn+i+4Fn-bn      ,         ,     _  /  2(-l)"/2^        if  n  is  even, 


^    where    K  =  | '("T'iU  ' 

\(_l)("-l)/2^  if, 


:  n  IS  odd, 

where  the  Fibonacci  numbers  are  given  by  Fq  =  0,  Fi  =  1  and  Fn  =  Fn-i  +  Fn—2  for  n  >  2; 
that  is,  their  generating  function  is  — - — r. 


(d)  Prove  that 


02n  =  Fn+2     and    02n+l  =  F„+2^'n+3     for  n  >  0. 


Hint.  Show  that  the  recursion  and  initial  conditions  are  satisfied. 

10.4.17.  Using  partial  fractions,  obtain  a  formula  for  rn  from  (10.33). 

*10.4.18.  Let  G  be  the  simple  graph  with  vortex  set  n  U  {0}  and  the  2n  edges  {n,  1},  {i,  i  +  1}  (1  <  i  <  n)  and 
{0,j}  (1  <  j  <  n),  except  for  n  =  1,  2  where  we  must  avoid  adding  {n,  1}  in  order  to  get  a  simple 
graph.  In  other  words,  G  is  like  the  graph  in  Example  10.17  except  that  one  more  edge  {l,n}  has 
been  added  so  that  the  picture  looks  like  a  wheel  with  spokes  for  n  >  2.  We  want  to  know  how  many 
spanning  trees  G  has. 

(a)  Let  T  be  as  in  Example  10.17  and  let  T'  consist  of  the  trees  in  T  with  one  of  the  nonroot  vertices 
marked.  Choose  one  tree  from  T'  and  then  a,  possibly  empty,  sequence  of  trees  from  T.  Suppose 
we  have  a  total  of  n  nonroot  vertices.  Merge  the  root  vertices  and  relabel  the  nonroot  vertices  1  to 
n,  starting  with  the  marked  vortex  in  the  tree  from  T'  and  preceding  cyclically  until  all  nonroot 
vertices  have  been  labeled.  Explain  why  this  gives  all  the  spanning  trees  exactly  once. 

(b)  Show  that  Gq->{x)  =  x{d/dx){Gq-{x))  =  x{l  + x)/{l  -  xf. 

(c)  Show  that  generating  function  for  the  spanning  trees  is 

x{l  +  x) 
(l-x)(l-3x  +  a;2)' 

(d)  Show  that  the  number  of  spanning  trees  is  2r„+i— Srn— 2,  where  rn  is  given  in  Example  10.17. 
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10.4.19.  Wo  define  a  set  of  regular  sequences  (or  regular  strings)  on  the  "alphabet"  A.  (An  alphabet  is  any 
finite  set.)  Let  R,  Ri,  and  i?2  stand  for  sets  of  regular  strings  on  A.  The  sets  of  regular  strings  on 
A  are  the  empty  set,  the  sets  {o}  where  a  €  A,  and  the  sets  that  can  be  built  recursively  using  the 
following  operations: 

•  union  ("or")  of  sets,  i.e.,  the  set  of  all  strings  that  belong  to  either  Ri  or  R2; 

•  juxtaposition  ("and  then"),  i.e.,  the  set  of  all  strings  rir2  where  ri  €  Ri  and  r2  €  R2; 

•  arbitrary  iteration  R* ,  i.e.,  for  all  n  >  0,  all  strings  of  the  form  rir2  ■  ■  -  rn  where  rj  €  R. 
(The  empty  string  is  obtained  when  n  =  0.) 

See  Example  10.16  for  a  specific  example  of  a  set  of  regular  sequences  The  purpose  of  this  exercise 

is  to  construct  a  nondctcrministic  finite  automaton  that  recognizes  any  given  set  of  regular  strings. 
Nondeterministic  finite  automata  are  defined  in  Section  6.6  (p.  189).  We  will  build  up  the  machine 
by  following  the  way  the  strings  are  built  up. 

(a)  Let  A  be  an  automaton.  Show  that  there  is  another  automaton  S{A)  that  recognizes  the  same 
strings  and  has  no  edges  leading  into  the  start  state. 

Hint.  Create  a  new  state,  let  it  be  the  start  state  and  let  it  have  edges  to  all  of  the  states  the 

old  start  state  did.  Remember  to  specify  the  accepting  states,  too. 

(b)  If  A  recognizes  the  set  A  and  B  recognizes  the  set  B,  construct  and  automaton  that  recognizes 
the  set  AUB. 

Hint.  Adjust  the  idea  in  (a). 

(c)  If  A  recognizes  A,  construct  an  automaton  that  recognizes  A* . 
Hint.  Add  some  edges. 

(d)  If  A  recognizes  the  set  A  and  B  recognizes  the  set  B,  construct  and  automaton  that  recognizes 
AB;  i.e.,  the  set  Ax  B. 


Notes  and  References 


The  classic  books  on  generating  functions  are  those  by  MacMahon  [6]  and  Riordan  [7].  They  are 

quite  difficult  reading  and  do  not  take  a  "combinatorial"  approach  to  generating  functions.  There  are 
various  combinatorial  approaches.  Some  can  be  found  in  the  texts  by  Wilf  [10]  and  Stanley  [9,  Ch.  3] 
and  in  the  articles  by  Bender  and  Goldman  [1]  and  Joyal  [5].  The  articles  are  rather  technical. 

Parts  of  the  texts  by  Greene  and  Knuth  [4]  and  by  Graham,  Knuth  and  Patashnik  [3]  are  oriented 
toward  computer  science  uses  of  generating  functions.  See  also  the  somewhat  more  advanced  text 
by  Sedgewick  and  Flajolet  [7].  Wilf  [10]  gives  a  nice  introduction  to  generating  functions.  Goulden 
and  Jackson's  book  [2]  contains  a  wealth  of  material  on  generating  functions,  but  is  at  a  higher  level 
than  our  text. 

We  have  studied  only  the  simplest  sorts  of  recursions.  Recursions  that  require  more  sophisti- 
cated methods  are  common  as  arc  recursions  that  cannot  be  solved  exactly.  Sometimes  approximate 
solutions  are  possible.  We  don't  know  of  any  systematic  exposition  of  techniques  for  such  problems. 

We  have  not  dealt  with  the  problem  of  defining  formal  power  series;  that  is,  defining  a  generating 
function  so  that  the  convergence  of  the  infinite  series  is  irrelevant.  An  introduction  to  this  can  be 
fomid  in  the  first  few  pages  of  Stanley's  text  [9]. 
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CHAPTER  11 

*Generating  Function 

Topics 


Introduction 


The  four  sections  in  this  chapter  deal  with  four  distinct  topics:  systems  of  recursions,  exponential 
generating  functions,  Polya's  Theorem  and  asymptotics  estimates.  These  sections  can  be  read  inde- 
pendently of  one  another.  The  section  on  "asymptotic"  estimates  refers  to  formulas  in  earlier  sections 
of  the  chapter,  but  there  is  no  need  to  read  the  section  containing  the  formula. 

"Systems  of  recursions,"  as  you  might  guess,  deals  with  the  creation  and  solution  of  sets  of 
simultaneous  recursions.  These  can  arise  in  a  variety  of  ways.  One  source  of  them  is  algorithms  that 
involve  interrelated  recursive  subroutines.  Another  source  is  situations  that  can  be  associated  with 
grammars.  General  context  free  grammars  can  lead  to  complicated  recursions,  but  regular  grammars 
(which  are  equivalent  to  finite  state  automata)  lead  to  simple  systems  of  recursions.  We  limit  our 
attention  to  these  simple  systems. 

Exponential  generating  functions  are  very  much  like  ordinary  generating  functions.  They  are 
used  in  situations  where  things  are  labeled  rather  than  unlabeled.  For  example,  they  are  used  to 
study  partitions  of  a  set  because  each  of  the  elements  in  the  set  is  different  and  hence  can  be  thought 
of  as  a  labeled  object.  We  will  briefly  study  the  Rules  of  Sum  and  Product  for  them  as  well  as  a 
useful  consequence  of  the  latter — the  Exponential  Formula. 

Burnside's  Lemma  (Theorem  4.5  (p.  112))  is  easily  seen  to  apply  to  the  situation  in  which  the 

objects  being  counted  have  "weights".  As  a  result,  we  can  introduce  generating  functions  into  the 
study  of  objects  with  symmetries.  This  "weighted  Burnside  lemma"  has  a  variety  of  important 
special  cases.  We  will  study  the  simplest,  and  probably  most  important  one — Polya's  theorem. 

Suppose  we  are  studying  some  sequence  of  numbers  a„  and  want  to  know  how  the  sequence 
behaves  when  n  is  large.  Usually  it  grows  rapidly,  but  we  want  to  know  more  than  that  we  want  a 
relatively  simple  formula  that  provides  some  sort  of  estimate  for  a„  .  Stirling's  formula  (Theorem  1.5 
(p.  12))  is  an  example  of  such  a  formula.  This  subject  is  referred  to  as  asymptotics  or  asymptotic 
analysis.  Because  the  proofs  of  results  in  this  area  require  either  messy  estimates,  a  knowledge  of 
complex  variables  or  both,  we  will  not  actually  prove  the  estimates  that  we  derive. 
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11.1    Systems  of  Recursions 


So  far  we've  dealt  with  only  one  recursion  at  a  time.  Now  we  look  at  ways  in  which  systems  of 

recursions  arise  and  adapt  our  methods  for  a  single  recursion  to  solving  simple  systems  of  recursions. 
The  adaptation  is  straightforward — allow  there  to  be  more  than  one  recursion.  As  usual,  examples 
are  the  best  way  to  see  what  this  is  all  about. 


Example  11.1  Counting  Batcher  sort  comparators  Let's  study  the  Batcher  sort.  As  with 
our  study  of  merge  sorting  in  Example  10.4  (p.  278),  we'll  limit  the  number  of  things  being  sorted  to 
a  power  of  2  for  simplicity.  We  want  to  determine  bk,  the  number  of  comparators  in  a  Batcher  sort 

for  2'^  things.  We'll  rewrite  the  Batcher  sorting  algorithm  in  Section  8.3.2  to  focus  on  the  number 
of  things  being  sorted  and  number  of  comparators.  The  comments  indicate  the  contributions  to  the 
recursion. 


BS0RT(2^  things) 

If  k  =  0 ,  Return 
BS0RT(2'=-^  things) 
BSORT  (2*^-1  things) 
BMERGE(2'=  things) 
Return 

End 


/*  uses  bk  comparators  */ 

/*  6o  =  0  */ 
/*  bk  =  bk-i  */ 
/*         +bk-i  */ 
/*         +mk  */ 


BMERGE(2'=  things) 

If  k  =  0,  Return 
End  if 
If  k  =  l, 

one  Comparator  and  Return 
BMERGE2(2'=  things) 
2*^"^  —  1  Comparators 
Return 

End 


/*  uses  mfe  comparators  */ 
/ *  mo  =  0  */ 

/*  mi  =  1  */ 

/*  mfe  =  tk  */ 

/if.         +2'=-^  -  1  */ 


BMERGE2(2''  things)  /*  uses  tk  comparators  */ 

BMERGE(2'=~^  things)  /*  tk  =  mk-i  */ 

BMERGE(2'=-i  things)  /*         +mfe_i  */ 

Return 

End 

Note  that  since  BMERGE2(2'^  things)  is  never  called  for  k  <2,  we  can  define  to  and  ti  arbitrarily. 

How  should  we  choose  the  values  of  to  and  ti?  In  the  end,  it  doesn't  really  matter  because  the 
answers  will  be  unaffected  by  our  choices.  On  the  other  hand,  how  we  choose  these  values  can  affect 
the  amount  of  work  we  have  to  do.  There  are  two  rules  of  thumb  that  can  help  in  making  such 
choices: 

•  Choose  the  initial  values  so  as  to  minimize  the  number  of  special  cases  of  the  recursion.  (For 
example,  the  recursion  below  has  two  special  cases  for  mfe.) 

•  Choose  simple  values  like  zero. 


We'll  set  to  =  0  and  ti  =  2mo  =  0. 
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Copying  the  recursions  from  the  comments  in  the  code  we  have 

h  = 


ruk  =  <  1  if  fc=  1,  11.1 


tk 


We  now  apply  our  six  steps  for  solving  recursions  (p.  277),  allowing  more  than  one  recursion  and 
more  than  one  resulting  equation.  Let  B{x),  M(x)  and  T(x)  be  the  generating  functions.  The  result 
of  applying  the  first  four  steps  to  (11.1)  is 

B{x)  =  2xB{x)  +  M(x), 

M{x)  =  X +  Y^tkX^ +Y^{2^~^  -  l)x*' 

k>2  k>2 

=   X  + 


k>0 

x  +  T{x)  + 


k>0  k>l 
X 


l-2x  1-x' 
T{x)  =  2xM{x). 

We  now  carry  out  Step  5  by  solving  these  three  linear  equations  in  the  three  unknowns  B{x),  M{x) 
and  T{x).  From  the  first  equation,  we  have 

Combining  the  equations  for  M{x)  and  T{x),  we  have 

M(x)  =  x  +  2xM(x)-\  —. 

1  —  2x     1  —  X 

Solving  this  for  M{x)  and  substituting  the  result  (11-2),  we  obtain 
B{x)  = 


(l-2x)2      (l-2x)3  (l-,x)(l-2x)2- 

Our  formula  for  B(x)  can  be  expanded  using  partial  fractions.  We  won't  carry  out  the  calcula- 
tions here  except  for  noting  that  we  can  rewrite  this  as 

"  (1  -  2a;)3  ~  (1  -  2a;)2  +  ~  T^' 

The  result  is 

bk  =  2^-'^(k^  -  fc  +  4)  -  1.  11.3 

How  does  this  result  compare  with  the  upper  bound  on  the  number  of  comparisons  in  merge  sort 
that  we  found  in  Example  10.4?  There  we  obtained  an  upper  bound  of  n  log2  n  and  here  we  obtained 
an  actual  value  that  is  close  to  in(log2  n)^.  Thus  Batcher  sort  is  a  poor  software  sorting  algorithm. 
On  the  other  hand,  it  is  a  much  better  network  sorting  algorithm  than  the  only  other  one  we  studied, 
the  brick  wall.  Q 


In  the  next  example  we  will  work  with  a  set  of  linked  recursions  in  the  context  of  a  directed 
graph  or,  equivalently,  a  finite  state  automaton. 
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ES]  M  M 

Figure  11.1  Three  ways  to  place  nonoverlapping  dominoes  on  2  by  5  boards.  A  domino  is  indicated  by 
a  white  rectangle  and  the  board  is  black. 


Example  11.2  Rectangular  arrays,  digraphs  and  finite  automata  Imagine  a.  k  hy  n 
array  of  squares.  We  would  like  to  place  dominoes  on  the  array  so  that  each  domino  covers  exactly 

two  squares  and  the  dominoes  do  not  overlap.  Not  all  squares  need  be  covered.  Thus  there  may  be 
any  number  of  dominoes  from  none  to  nk/2.  Let  a„(fc)  be  the  number  of  ways  to  do  this.  Figure 
11.1  shows  some  ways  to  place  dominoes  on  a  2  by  5  board. 

We  can  work  out  a  recursion  for  a„(l)  quite  easily.  Let's  do  it.  When  n  >  1,  the  number  of 
arrangements  with  the  last  square  empty  is  a„_i(l)  and  the  number  of  arrangements  with  a  domino 
in  the  last  square  is  a„_2(l).  This  is  the  same  as  the  recursion  for  the  Fibonacci  numbers,  but  the 
initial  conditions  are  a  bit  different.  (What  are  they?)  After  some  calculation,  you  should  be  able  to 
get  a„(l)  =  Fn-i. 

There  is  a  quicker  way  to  get  a„(l).  If  we  replace  each  empty  square  with  a  "0"  and  each  domino 
with  "10" ,  we  obtain  an  n-long  string  of  zeroes  and  ones  ending  in  zero  and  having  no  adjacent  ones. 
Conversely  any  such  string  of  zeroes  and  ones  gives  rise  to  a  placement  of  dominoes.  By  stripping 
off  the  rightmost  zero  in  the  string,  we  have  that  a„(l)  =  Fn-i- 

For  k  >  l,an{k)  is  much  more  complicated.  Try  to  write  down  a  recursion  for  a„(2) — or  calculate 
it  in  any  other  manner. 

*        *        *       Stop  and  think  about  this!        *        *  * 

We  will  show  how  to  use  a  directed  graph  to  produce  our  patterns  of  dominoes.  The  graph  can 
be  regarded  as  a  finite  state  machine  or,  more  specifically,  a  nondeterministic  finite  automaton 
(Section  6.6).  We  will  show  how  to  associate  generating  functions  with  such  digraphs. 

Our  machine  (digraph)  has  four  states  (vertices) ,  which  we  call  clear,  first,  second  and  both.  Wc 
imagine  moving  across  the  2  x  n  array,  one  column  at  a  time  from  left  to  right,  placing  dominoes  as  we 
reach  a  column.  When  we  decide  to  place  a  domino  in  a  horizontal  position  starting  in  column  j  of  the 
array,  we  are  covering  a  square  in  column  j  + 1  that  we  haven't  yet  reached.  We  call  this  covering  of 
an  unreached  square  a  "commitment."  At  the  next  step  we  must  take  into  account  any  commitment 
that  was  made  in  the  previous  step.  Our  states  keep  track  of  the  unsatisfied  commitments;  that  is, 
if  we  are  filling  the  j'th  column,  the  commitments  made  at  column  i  —  1: 

(a)  clear  means  there  are  no  commitments  to  take  into  account; 

(b)  first  means  we  made  a  commitment  by  placing  a  domino  in  the  first  row  of  column  j  —  1 
which  runs  over  into  column  j; 

(c)  second  means  we  made  a  commitment  by  placing  a  domino  in  the  second  row; 

(d)  both  means  we  made  commitments  by  placing  dominoes  in  both  rows. 

Using  the  letters,  c,  /,  s  and  fe,  the  sequences  of  states  for  the  columns  in  Figure  11.1  arc  fsfcc, 
fcfsc  and  ccscc,  respectively.  The  cohimns  to  which  the  commitments  arc  made  arc  2  through  6.  No 
commitments  are  made  to  column  1  because  no  dominoes  are  placed  before  column  1.  If  we  wanted 
to  reflect  this  fact  in  our  sequences,  we  could  add  an  entry  of  c  for  column  1  at  the  beginning  of 
each  sequence.  Note  that  all  of  these  strings  end  with  a  c  because  no  dominoes  hang  over  the  right 
end  of  the  board. 

There  is  an  edge  in  our  digraph  from  state  x  to  state  y  for  each  possible  way  to  get  from  state 

X  at  column  j  of  the  array  to  state  y  at  column  j  +  1  of  the  array.  For  example,  we  can  go  from 
clear  to  clear  by  placing  no  dominoes  in  column  j  or  by  placing  one  vertical  domino  in  the  column. 
Thus  there  are  two  loops  at  clear.  Since  we  cannot  go  from  first  to  both,  there  are  no  edges  from 
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first  to  both.  The  following  table  summarizes  the  possibilities.  An  entry  mx,y  in  position  {x,  y)  is  the 
number  of  edges  from  state  x  to  state  y. 


clear 

first 

second 

both 

clear 

2 

1 

1 

1 

first 

1 

0 

1 

0 

second 

1 

1 

0 

0 

both 

1 

0 

0 

0 

To  complete  our  picture,  we  need  the  initial  and  accepting  states.  From  the  discussion  at  the  end 
of  the  previous  paragraph,  you  should  be  able  to  see  that  our  initial  state  should  be  clear  and  that 
there  should  be  one  accepting  state,  which  is  also  clear. 

Let  c„  be  the  number  of  ways  to  reach  the  state  clear  after  taking  n  steps  from  the  starting 
state,  clear.  This  means  that  no  dominoes  hang  over  into  column  n  +  1.  Note  that  c„  is  the  number 
of  ways  to  place  dominoes  on  a  2  x  n  board.  Define  /„,  s„  and  6„  in  a  similar  way  according  to  the 
state  reached  after  n  steps  from  the  starting  state.  Let  C{x),  etc.,  be  the  corresponding  generating 
functions. 

We  are  only  interested  in  c„  (or  C{x)),  so  why  introduce  all  these  other  functions?  The  edges 
that  lead  into  a  state  give  us  a  recursion  for  that  state,  for  example,  looking  down  the  column  labeled 

clear,  we  see  that 

Cn+l  =  2c„  +  /„  +  s„  +  6„  11.4 

for  n  >  0.  Thus,  when  we  study  c„  this  way,  we  end  up  needing  to  look  at  the  functions  /„,  s„  and 
6„,  too. 

In  a  similar  manner  to  the  derivation  of  (11.4), 

Sn+l  =  Cn+  fn,  11-5 

for  n  >  0.  The  initial  conditions  for  the  recursions  (11.4)  and  (11.5)  are  the  values  at  n  =  0.  Since 
the  initial  state,  clear,  is  the  only  state  we  can  get  to  in  zero  steps, 

Co  =  1  and  /o  =  sq  =  &o  =  0. 

To  sec  how  much  work  these  recursions  involve,  use  them  to  find  cg,  the  number  of  ways  to  place 
dominoes  on  a  2  by  8  board.  Can  we  find  an  easier  way  to  calculate  c„;  for  example,  one  that  does 
not  require  the  values  of  /,  s  and  d  as  well?  Yes. 

We  begin  by  converting  the  recursions  into  a  set  of  linked  generating  functions  using  our  method 
for  attempting  to  solve  recursions.  Multiplying  both  sides  of  the  equations  (11.4)  and  (11.5)  by  a;""*"^, 
summing  over  n  >  0  and  using  the  initial  conditions,  we  obtain 

C{x)  =  x{2C{x)+F{x)  +  S{x)  +  B{x))  +  l 

F{x)  =  x{C{x)  +  S{x)) 

S{x)  =  x{C{x)+F{x)) 

B{x)  =  xC{x). 

We  have  four  linear  equations  in  the  four  imknowns  C{x),  F(x),  S{x)  and  B{x).  Let's  solve 
them  for  C{x).  Subtracting  the  second  and  third  equations,  we  get  F{x)  —  S{x)  =  x{S{x)  —  F{x)) 
for  all  X  and  so  S{x)  =  F{x).  Thus  F{x)  =  xC{x)  +  xF{x)  and  so  ^'(a;)  =  S{x)  =  xC{x)/{l  -  x). 
Substituting  this  and  B{x)  =  xC{x)  into  the  first  equation  gives  us 

Cix)  =  2xC{x)  +  '^^^^^+x'^C{x)  +  l. 
1  —  X 

With  a  bit  of  algebra,  we  easily  obtain 
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A  method  of  obtaining  this  by  working  directly  with  the  table,  thought  of  as  a  matrix 

M  =  0    ^  11.7 

110  0 

Vl    0    0  0/ 

is  given  in  Exercises  11.1.7. 

We  can  use  partial  fractions  to  determine  c„,  but  before  doing  so,  we'd  like  to  note  that  (11.6) 
gives  us  an  easier  recursion  for  calculating  c„:  Prom  (11.6)  we  have  C{x)  =  l  —  x  +  {3x+x'^  —x^)C{x). 
Looking  at  the  coefficient  of  x"  on  both  sides,  we  have 

Co  =  1,       ci  =  -1  +  3co  =  2,       C2  =  3ci  +  Co  =  7 

and 

c„  =  3c„_i  +  c„_2  -  c„_3    for    n>  2.  11.8 

Use  this  recursion  to  find  cg  and  compare  the  amount  of  work  with  that  when  you  were  using 
(11.4)  and  (11.5).  It's  also  possible  to  derive  (11.8)  by  manipulating  (11.4)  and  (11.5)  without  using 

generating  fimctions.  (Try  doing  it.) 

To  use  partial  fractions,  wc  must  factor  the  denominator  of  (11.6): 

1  —  3a;  —  .x^  +  x'^  =  (1  —  px){l  —  qx){l  —  rx). 

Unfortunately,  the  cubic  does  not  factor  with  rational  p,  q  or  r.  Using  numerical  methods  we  found 
that  p  =  3.21432 . . .,  g  =  .46081 ...  and  r  =  -.67513 ....  By  partial  fractions,  c„  =  Pp"  +  (5gr"  +  i?r", 
where  P  =  .66459. . .,  Q  .07943 . . .  and  i?  =  .25597. . ..  Thus  a„  is  the  integer  closest  to  Pp". 
Since  we  know  p  and  P  only  approximately,  we  can't  get  c„  exactly  for  large  n  this  way.  Instead, 
we'd  use  the  recursion  (11.8)  to  get  exact  values  for  c„.  On  the  other  hand,  the  recursion  gives  us 
no  idea  about  how  fast  c„  grows,  so  we'd  use  our  partial  fraction  result  for  this  sort  of  information. 
For  example,  the  result  says  that,  in  some  sense,  the  average  number  of  choices  per  column  is  about 
p  since  if  there  were  exactly  p  choices  per  column,  there  would  be  p"  ways  to  place  dominoes  in  n 
columns.  Q 


Example  11.3  Binary  operations  Let  A  denote  exponentiation;  e.g.,  3  A  2  =  9.  The  interpre- 
tation of  2  A  3  A  2  is  ambiguous.  We  can  remove  the  ambiguity  by  using  parentheses: 

(2  A  3)  A  2  =  8  A  2  =  64 

2A(3A2)  =  2A9  =  128. 

Sometimes  we  get  the  same  answer  from  different  parenthesizations;  e.g.,  2  A  2  A  2  =  16,  regardless 
of  parentheses. 

Let's  consider  the  possible  values  of 

0  AO  A...  AO.  11.9 

Unfortunately,  0  A  0  is  not  defined  for  the  real  numbers.  For  the  purpose  of  this  example,  we'll  define 
0  A  0  =  1.  The  only  other  powers  we  need  are  well  defined.  In  summary 

0A0=1,   0A1  =  0,    1A0  =  1  and  1A1  =  1.  11.10 

You  should  be  able  to  show  by  induction  on  the  length  of  the  string  that  the  only  possible  values 
for  (11.9)  are  0  and  1,  regardless  of  how  the  expression  is  parenthesized. 

How  many  of  the  parenthesizations  of  (11.9)  give  the  expression  the  value  0  and  how  many  give 
it  the  value  1? 

If  there  are  n  zeroes  present,  let  Zn  be  the  number  of  ways  to  obtain  the  value  0  and  let  w„  be 
the  number  of  ways  to  obtain  1.  We  can  write  down  the  generating  functions  by  using  the  Rules  of 
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Sum  and  Product:  A  zero  is  produced  if  n  =  1  OR,  by  (11.10),  if  the  left  string  is  a  zero  AND  the 
right  string  is  a  one.  By  (11.10),  the  other  three  combinations  give  a  one.  Thus 

Z(x)  =  x  +  Z(x)W(x) 

11.11 

W{x)  =  Z{x)Z{x)  +  W{x)Z{x)  +  W{x)W{x). 

These  equations  are  more  complicated  than  our  earlier  ones  because  they  are  not  linear.  In  general, 
we  cannot  hope  to  solve  such  equations;  however,  these  can  be  solved.  Try  to  do  so  before  you 
continue. 

*       *       *       Stop  and  think  about  this!        *       *  * 

Let  T{x)  =  W{x)  +  Z{x),  the  total  number  of  parenthesizations  without  regard  to  value.  Using 
algebra  on  (11.11)  or,  more  simply,  using  a  direct  combinatorial  argument,  you  should  be  able  to 
show  that 

T{x)  =  x  +  T{x)T{x). 


The  solution  to  this  equation  is 


T(.)  =  1^4^, 


where  the  minus  sign  was  chosen  on  the  square  root  because  r(0)  =  to  =  0.  We  now  rewrite 

Z{x)  =  x  +  Z{x)W{x)  as 

Z{x)  =  X  +  Z{x){T{x)  -  Z{x))  =  x  +  T{x)Z{x)  -  Z{xf.  11.13 
Solving  this  quadratic  for  Z{x): 

T{x)  -  1  +  J(l-T{x)?  +4x 
Z{x)  =  —  ^—^  —  ,  11.14 

where  the  root  with  the  plus  sign  was  chosen  because  Zi^)  =  0. 

Of  course,  we  can  substitute  (11.12)  into  (11.14)  to  obtain  the  explicit  formula  for  Z{x).  Unfor- 
tunately, we  are  unable  to  extract  a  nice  formula  for  z„  from  the  result. 

What  can  be  done?  At  present,  not  much;  however,  in  Example  11.31  (p.  351),  we  will  obtain 
estimates  of  z„  for  large  n. 

Note  that  T{x)  =  B{x),  the  generating  function  for  unlabeled  full  binary  RP-trees.  We  obtained 
a  differential  equation  for  B  in  Example  10.8  (p.  284).  This  led  to  a  simple  recursion.  There  are  general 
techniques  for  obtaining  such  differential  equations  and  hence  recursions,  but  they  are  beyond  the 
scope  of  this  text.  We  merely  remark  that  it  is  possible  to  obtain  a  recursion  for  Zn  that  requires 
much  less  work  than  would  be  involved  in  simply  using  the  recursion  that  is  obtained  by  extracting 
the  coefficient  of  a;"  in  (11.13).  □ 


Exercises 


Derive  (11.8)  directly  from  (11.4)  and  (11.5)  without  using  generating  functions. 

11.1.2.  Redo  Exercise  10.2.4  (p.  280)  using  the  directed  graph  method  of  Example  11.2.  Which  way  was 
easier? 

Hint.  Here's  one  way  to  set  it  up.  Introduce  three  vertices  to  indicate  the  present  digit:  zero,  one  and 
two.  You  can  introduce  a  bad  vertex,  too,  or  make  certain  digits  impossible.  Since  you  can  end  at 
any  digit,  you'll  want  to  add  generating  functions  together  to  get  your  answer. 
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11.1.3.  Let  On  be  the  number  of  ways  to  place  dominoes  on  a  3  by  n  board  with  no  blank  spaces.  Note  that 
an  ~  0  when  n  is  odd.  Let  A{x)  be  the  generating  function. 

(a)  Prove  that  A{x)  =  (1  -  x'^)/{l  -  Ax"^  +  x'^). 

(b)  Show  that  On  =  4o„_2  —  a„_4  when  n  >  4. 

*(c)  Prove  the  previous  recursion  without  the  use  of  generating  functions. 

11.1.4.  How  many  n-long  sequences  of  zeroes  ones  and  twos  are  there  with  no  adjacent  entries  equal? 

11.1.5.  Let  £  be  a  set  of  strings  and  let  an  be  the  number  of  strings  in  £  of  length  n.  Suppose  that  we 
are  given  a  nondeterministic  finite  automaton  for  recognizing  precisely  the  strings  in  £..  Describe  a 
method  for  using  the  automaton  to  get  the  generating  function  for  the  o^'s. 

11.1.6.  Let  a„  j  (2)  be  the  number  of  ways  to  place  exactly  j  dominoes  on  a  2  by  n  board.  Extend  the  finite 

machine  approach  of  Example  11.2  so  that  you  can  calculate       j  '^n,j{'2)x^y'' ■ 

Hint.  Put  on  the  edges  of  the  digraph  information  about  the  number  of  dominoes  added  to  the  board. 
Use  it  to  write  down  the  equations  relating  the  generating  functions. 

11.1.7.  Given  a  finite  machine  as  in  Example  11.2,  let  mx,y  be  the  number  of  edges  from  state  x  to  y.  This 
defines  a  matrix  M  of  (11.7). 

(a)  Show  that  m^"y,  the  {x,y)  entry  in  M",  is  the  number  of  ways  to  get  from  x  to  y  in  exactly  n 
steps. 

(b)  Let  a  be  a  row  vector  with      =  1  if  fe  is  an  accepting  state  and      =  0  otherwise.  Define  i 

similarly  for  the  initial  state.  Show  that  iM"a*  is  the  number  of  ways  to  get  from  the  initial 
state  to  an  accepting  state  in  exactly  n  steps.  (We  use  a*  for  the  column  vector  which  is  the 
transpose  of  the  row  vector  a.) 

(c)  Conclude  that  the  generating  functions  in  Example  11.2  and  in  Exercise  11.1.5  have  the  form 
i(7  —  a;M)~^a*,  where  I  is  the  identity  matrix. 

(d)  Extend  the  definition  of  M  to  include  the  previous  exercise. 

11.1.8.  Let  an  be  the  number  of  ways  to  distribute  unlabeled  balls  into  boxes  numbered  1  through  n  such 
that  for  each  j  with  1  <  j  <  n,  the  number  of  balls  in  boxes  j  and  j  +  1  together  is  at  most  2. 

(a)  Construct  a  directed  graph  that  can  be  used  to  calculate  the  generating  function  for  an- 
Hint.  Let  the  vertices  (states)  be  the  number  of  balls  in  a  box. 

(b)  Obtain  a  set  of  linked  equations  for  the  various  generating  functions. 

(c)  Show  that  Y,  =  (1  +  X  -  x^)/{l  -2x-x^  +  X^). 

(d)  Restate  the  solution  in  matrix  terms  using  the  previous  exercise. 

(e)  Find  the  roots  of     —  2r^  —  r  +  1  =  0  numerically  and  use  partial  fractions  to  determine  the 

value  of  an. 

(f)  Let  bn^k  be  the  number  of  ways  to  distribute  k  balls  into  n  boxes  subject  to  our  adjacency 
condition.  Using  the  idea  in  Exercise  11.1.6,  determine  the  generating  function  for  bn,k- 


11.2     Exponential  Generating  Functions 


315 


11.1.9.  How  one  approaches  a  problem  can  be  very  irnjiortant.  Obviously,  the  wrong  attack  on  a  problem 
may  result  in  no  solution.  Less  obvious  is  the  fact  tiiat  one  approach  may  involve  much  less  work 
than  another.  Everyone  sometimes  fails  to  find  the  easiest  approach.  You  can  probably  find  examples 
of  this  in  the  way  you've  solved  some  homework  problems.  Here's  another  brief  example.  How  many 
n  long  sequences  of  zeroes  and  ones  contain  the  pattern  p  =  010101011? 

(a)  One  approach  is  to  use  two  of  the  tools  we've  developed:  designing  finite  automata  for  accepting 

sequences  with  certain  patterns  and  obtaining  generating  functions  from  automata.  This  would 
require  an  automaton  with  at  least  ten  states.  Draw  such  an  automaton  for  p  and  write  down 
the  family  of  associated  equations. 

(b)  We'll  look  at  a  simpler  approach.  Let  a-n  be  the  number  of  n  long  strings  that  do  not  contain 
the  desired  pattern  p.  By  considering  what  happens  when  something  is  added  to  the  end  of  an 
n  —  1  long  pattern,  show  that  2a„_i  =  o,,  +  Cn  for  n  >  0,  where  Cn  is  the  number  of  n  long 
strings  s\,...,Sn  containing  p  but  with  p  not  contained  in  si, . . . ,  Sn— 1-  Also  show  that  Cn  —  0 
for  n  <  9,  the  length  of  p. 

(c)  Show  that  Cn  =  On-9  and  conclude  that  A{x)  =  (1  —  a;  +  x'^)~^ . 
Hint.  If  your  proof  also  works  for  001001001,  it  is  not  quite  correct. 

(d)  Generalize  the  previous  result  as  much  as  you  can. 

(e)  Show  that  the  finite  automata  approach  might  sometimes  be  better  by  giving  an  automaton  for 

recognizing  the  strings  that  contain  the  pattern  001001001. 

11.1.10.  The  situation  we  studied  in  Example  11.3  can  be  generalized  in  various  ways.  In  this  exercise  you 
will  study  some  possibilities. 

(a)  Suppose  we  have  a  set  A  of  symbols  and  a  binary  operation  o  on  A.  This  means  that  for  all 

s,t  £  A  the  value  oi  s  ot  £  A  is  given.  Consider  the  "product"  6  o  6  o  . . .  o  6.  We  want  to  know 
how  many  ways  each  of  the  elements  s  £  A  can  arise  by  parenthesizing  this  product.  Describe 
carefully  how  to  obtain  the  equations  relating  the  generating  functions  from  the  "multiplication" 
table  for  the  operation  o. 

(b)  We  can  change  the  previous  situation  by  letting  o  be  a  fc-ary  operation.  For  example,  if  it  is  a 
ternary  operation,  there  is  no  way  to  make  sense  of  either  the  expression  bob  or  the  expression 
b  o  b  o  b  o  b.  On  the  other  hand,  we  have  three  parenthesizations  for  the  5-long  case: 

(bobob)oboh,    60(60606)06  and  boh  o  [hobo  b). 

Again,  describe  how  to  construct  equations  relating  the  generating  functions. 
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When  wc  use  ordinary  generating  functions,  the  parts  we  are  counting  are  "unlabeled."  It  may 
appear  at  first  sight  that  this  was  not  so  in  all  the  applications  of  ordinary  generating  functions. 
For  instance,  we  had  sequences  of  zeroes  and  ones.  Isn't  this  labeling  the  positions  in  a  sequence? 
No,  it's  dividing  them  into  two  classes.  If  they  were  labeled,  we  would  require  that  each  entry  in  the 
sequence  be  different;  that  is,  each  label  would  be  used  just  once.  Well,  then,  what  about  placing  balls 
into  labeled  boxes?  Yes  the  boxes  are  all  different,  but  the  parts  we  are  counting  in  our  generating 
functions  are  the  unlabeled  balls,  not  the  boxes.  The  boxes  simply  help  to  form  the  structure. 

In  this  section,  we'll  use  exponential  generating  functions  to  count  structures  with  labeled  parts. 
What  we've  said  is  rather  vague  and  may  have  left  you  confused.  We  need  to  be  more  precise,  so 
we'll  look  at  a  particular  example  and  then  explain  the  general  framework  that  it  fits  into. 

Recall  the  problem  of  counting  unlabeled  full  binary  RP-trees  by  number  of  leaves.  We  said  that 
any  such  tree  could  be  thought  of  as  either  a  single  vertex  OR  an  ordered  pair  of  trees.  Let's  look  at 
the  construction  of  this  ordered  pair  a  bit  more  closely.  If  the  final  tree  is  to  have  n  leaves,  we  first 
choose  some  number  k  of  leaves  and  construct  a  tree  with  that  many  leaves,  then  we  construct  a  tree 
with  n  —  k  leaves  as  the  second  member  of  the  ordered  pair.  Thus  there  is  a  three  step  procedure: 
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1.  Determine  the  number  of  leaves  for  the  first  tree  (and  hence  also  the  second),  say  k. 

2.  Construct  the  first  tree  so  that  it  contains  k  leaves. 

3.  Construct  the  second  tree  so  that  it  contains  n  —  k  leaves. 

Now  let's  look  at  what  happens  if  the  leaves  are  to  be  labeled;  i.e.,  there  is  a  bijection  from  the 
n  leaves  to  some  set  N  oi  n  labels.  (Usually  we  have  N  =  n,  but  this  need  not  be  so.)  In  this  case, 
we  must  replace  our  first  step  by  a  somewhat  more  complicated  step  and  modify  the  other  two  steps 
in  an  obvious  manner: 

1'.  Determine  a  subset  K  of  N  which  will  become  the  labels  of  the  leaves  of  the  first  tree. 

2'.  Construct  the  first  tree  so  that  its  leaves  use  K  for  labels. 

3'.  Construct  the  second  tree  so  that  its  leaves  use  N  —  K  for  labels. 

Note  that  the  number  of  ways  to  carry  out  Steps  2'  and  3'  depend  only  on  |A^|  and  \K\,  not  on  the 
actual  elements  of  N  and  K.  This  is  crucial  for  the  use  of  exponential  generating  functions.  Because 
of  this,  it  is  convenient  to  split  Step  1'  into  two  steps: 

I'a.  Determine  the  number  of  leaves  for  the  first  tree  (and  hence  also  the  second),  say  k. 
I'h.  Determine  a  subset  K  of  N  with  \K\  =  fc  to  be  the  leaves  of  the  first  tree. 

Let  bn  be  the  number  of  unlabeled  full  binary  RP-trees  with  n  leaves  and  let  i„  be  the  number 
of  such  trees  except  that  the  leaves  have  been  labeled  using  some  set  N  with  |A'^|  —  n.  For  n  >  1, 
our  unlabeled  construction  gives  us  bkbn-k,  where  the;  summation  comes  from  the  first  step,  the 
bk  from  the  second  and  the  bn-k  from  the  third.  Similarly,  for  the  labeled  construction,  we  have 

^(?jtfein-fc    for  n  >  1,  11.15 

k=a  ^  ^ 

where  now  the  summation  comes  from  Step  I'a  and  the  binomial  coefficient  from  Step  I'b. 

The  initial  condition  t\  =  1  (and,  if  we  want,  to  =  0)  together  with  (11.15)  gives  us  a  recursion 
for  tn-  We'd  like  to  use  generating  functions  to  solve  this  recursion  as  we  did  for  the  imlabeled  case. 
The  crucial  step  for  the  unlabeled  case  was  the  observation  that  we  were  dealing  with  a  convolution 
which  then  led  to  (10.18).  If  this  approach  is  to  work  in  the  labeled  case,  we  need  to  be  able  to  view 
(11.15)  as  a  convolution,  too. 

Recall  that  a  convolution  is  something  of  the  form  akCn-k-  Unfortunately,  (11.15)  contains 
(^)  which  is  not  a  function  of  k  and  is  not  a  function  of  n  —  k,  so  it  can't  be  included  in  ak  or  in 
Cn-k-  Fortunately,  we  can  get  around  this  by  rewriting  the  binomial  coefiicient  in  terms  of  factorials: 

fe=0  ^  ^  fc=0       ^  ' 

If  we  divide  this  equation  by  n\,  we  get  a  recursion  for  t„/n!  in  which  the  sum  is  a  convolution.  Thus, 
the  generating  function  B{x)  =  Y^  bnX"  in  the  unlabeled  case  should  be  replaced  by  the  generating 
function  T{x)  =  ^(i„/n!).T"  in  the  labeled  case.  We  can  then  proceed  to  solve  the  problem  just  as 
we  did  in  the  unlabeled  case. 

You  may  have  noticed  that  t„  =  6„n!,  a  result  which  can  be  proved  directly.  So  why  go  through 
all  this?  We  did  it  to  introduce  an  idea,  not  to  solve  a  particular  problem.  So  let's  formulate  the 
idea. 

Definition  11.1  Exponential  generating  function  Tie  exponential  generating  func- 
tion for  the  sequence  c  is 

^c„(a;"/n!). 

n>0 

If  T  is  some  set  of  structures  and  w{T)  is  the  number  of  labels  in  T,  then  E^(a;),  the  exponen- 
tial generating  function  for  T  is  '^j'^q'  x^^^^ / {w(T))\.  We  abbreviate  "exponential  generating 
function"  to  EGF. 
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Theorem  11.1  Rule  of  Sum  Suppose  a  set  T  of  structures  can  be  partitioned  into  sets 
Ti,. . .  ,Tj  so  that  each  structure  in  T  appears  in  exactly  one  Ti.  It  then  follows  that 

Er  =  Et,  +  ---  +  Et,. 

Theorem  11.2  Rule  of  Product  Suppose  each  structure  in  a  set  T  of  structures  can  be 
constructed  from  an  ordered  partition  {Ki ,  K2)  of  the  labels  and  some  pair  (Ti,  T2)  of  structures 
using  the  labels  Ki  in  Ti  and  K2  in  T2  such  that: 

(i)  The  number  of  ways  to  choose  a  Ti  with  labels  Ki  depends  only  on  i  and  \Ki\. 

(ii)  Each  structure  T  gT  arises  in  exactly  one  way  in  this  process. 

(We  allow  the  possibility  ofKi  =  %  if  Ti  contains  structures  with  no  labels.)  It  then  follows  that 

Ej-{x)  =  E,{x)E2{x), 

where  Ei{x)  =  X^^o  *«."^"/^'  number  of  ways  to  choose  Ti  with  labels  n. 

Proof:  You  should  be  able  to  prove  the  Rule  of  Sum.  The  Rule  of  Product  requires  more  work. 
Let  tn  be  the  number  of  T  e  T  with  w{T)  =  n.  By  the  assumptions, 

and  so 

in     _    J_  \  ^  f^\-f-       f  _    \  ^  ^l,fc  t2.n-k 

^.  ~  ^.^{kj  ''"  ^'^-'^  ~  {n-k)V 

k=0  ^  ^  k=0  ^  ' 

a  convolution.  Multiply  by  a;",  sum  over  n  and  rearrange  to  obtain  E^(a;)  =  Ei(a;)E2(a;).  Q 

If  you  compare  these  theorems  with  those  given  for  ordinary  generating  functions  in  Section  10.4, 
you  may  notice  some  differences. 

•  First,  the  Rule  of  Product  was  stated  here  for  a  sequence  of  only  two  choices.  This  is  not 
an  essential  difference — you  can  remove  the  constraint  at  the  expense  of  a  more  cumbersome 
statement  of  the  theorem  or  you  can  simply  iterate  the  theorem:  divide  things  into  two  choices, 
then  subdivide  the  second  choice  in  two  and  so  on. 

•  The  second  difference  seems  more  substantial:  There  is  no  parallel  to  condition  (iii)  of  the 
ordinary  generating  function  version.  Theorem  10.4  (p.  292).  This  is  because  it  is  implicitly  built 
into  the  statement  of  the  theorem — the  EGF  counts  by  total  number  of  labels,  so  it  follows  that 
if  T  comes  from  Ti  and  T2,  then  w{Ti)  +  w{T2)  =  w{T). 

•  Finally,  neither  of  the  theorems  here  mentions  either  an  infinite  number  of  blocks  in  the  partition 
(Rule  of  Sum)  or  an  infinite  number  of  steps  (Rule  of  Product).  Again,  this  is  not  a  problem — 
infinitely  many  is  allowed  here,  too. 
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Example  11.4  Counting  derangements  Recall  that  a  derangement  is  a  permutation  with  no 
fixed  points  and  that  £>„  denotes  the  number  of  derangements  on  an  n  element  set.  It  is  convenient 
to  say  that  there  is  one  permutation  of  the  empty  set  and  that  it  is  a  derangement  because  it  does 
not  map  anything  to  itself.  By  splitting  off  the  fixed  points  from  an  arbitrary  permutation,  say  n  —  k 
of  them,  we  have 

fe=o  ^  ^ 

We  can  manipulate  this  to  obtain  an  EGF  for  D„,  but  it  is  easier  to  go  directly  from  the  combinatorial 
argument  to  the  Rule  of  Product.  We'll  do  that  now. 

Let  D{x)  be  the  EGF  for  derangements  by  number  of  things  deranged.  A  permutation  of  n 
can  be  constructed  by  choosing  some  subset  K  of  n,  constructing  a  derangement  of  K  and  fixing 
the  elements  of  n  —  K.  Every  permutation  of  n  arises  exactly  once  this  way.  We  make  three  simple 
observations: 

•  The  EGF  for  all  permutations  is 

•  Since  there  is  just  one  permutation  fixing  all  the  elements  of  a  given  set,  the  EGF  for  permuta- 
tions with  all  elements  fixed  is  ^a;"/n!  =  e^. 

•  By  the  Rule  of  Product,  1/(1  —  a;)  =  D{x)e^  and  so 

D(x)  =  11.17 
1  —  X 

Note  that  we  never  needed  to  write  down  (11.16).  In  Example  10.7  (p.  283)  and  Exercise  10.3.1 
(p.  291),  (11.17)  was  used  to  obtain  simple  recursions  for  £)„. 

The  simplicity  of  this  derivation  and  the  ones  in  the  examples  that  follow  illustrate  the  power 
of  the  Rule  of  Product.  Q 

Example  11.5    Sequences  of  letters     How  many  n  long  sequences  can  be  formed  from  A,  B 

and  C  so  that  the  number  of  A's  in  the  sequence  is  odd  and  the  number  of  B's  in  the  sequence  is 
odd?  The  labeled  objects  are  simply  the  positions  in  the  sequence.  We  form  an  ordered  partition  of 
the  labels  n  into  three  parts,  say  {Pa,Pb,Pc)-  The  letter  A  is  placed  in  all  the  positions  that  are 
contained  in  the  set  Pa,  and  similarly  for  B  and  C.  This  is  just  the  set  up  for  the  Rule  of  Product. 
If  \Pa\  is  odd,  we  can  place  the  A's  in  just  one  way,  while  if  \Pa\  is  even,  we  cannot  place  them 
because  of  the  requirement  that  the  sequence  contain  an  odd  number  of  A's.  Thus  the  EGF  for  the 
A's  is 

x''/k\  =  (e^  +  e-^)/2. 

k  odd 

This  is  also  the  EGF  for  the  B's.  For  the  C's  we  have  J2  x''/k\  =  e^.  Thus  the  EGF  for  our  sequences 
is 

[-^)  '  =  i  • 

and  so  the  answer  is  (3"  +  2  +  (— 1)")/4.  You  might  like  to  try  to  find  a  direct  counting  argument 
for  this  result. 

This  could  also  have  been  done  with  ordinary  generating  functions  and  multisection  of  series: 
Keep  track  of  the  number  of  A's,  the  number  of  B's  and  the  length  of  the  sequence  using  three 
variables  in  the  generating  function.  Then  use  multisection  twice  to  insure  an  odd  number  of  A's 
and  an  odd  number  of  B's.  Finally,  set  the  variables  that  are  keeping  track  of  the  A's  and  B's  equal 
to  1.  We  will  not  carry  out  the  details.  D 
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As  noted  in  the  last  example,  some  problems  can  be  done  with  both  ordinary  and  exponential 
generating  functions.  In  such  cases,  it  is  usually  clear  that  one  method  is  easier  than  the  other.  In 
some  other  problems,  it  is  necessary  to  use  generating  functions  that  are  simultaneously  exponential 
and  ordinary.  This  happens  because  one  class  of  objects  we're  keeping  track  of  has  labels  and  the 
other  class  does  not.  Here's  an  example  of  this. 

Example  11.6  Words  from  a  collection  of  letters  In  Example  1.11  (p.  13)  wo  considered 
the  problem  of  counting  strings  of  letters  of  length  k,  where  the  letters  can  be  repeated  but  the 
number  of  repetitions  is  limited.  Specifically,  we  used  the  letters  in  the  word  ERROR  but  insisted 
no  letter  could  be  used  more  than  it  appeared.  Wc  suggest  that  you  review  Example  1.11  (p.  13) 
and  the  improved  methods  in  Examples  1.19  (p.  24)  and  3.3  (p.  69),  where  we  used  the  letters  in 
ERRONEOUSNESS.  Imagine  letters  that  are  labeled,  each  by  its  position  in  the  word.  Since  there 
are  three  E's  and  only  one  word  of  any  length  can  be  built  with  just  E's,  the  EGF  for  words  of  E's 
is  1  +  x  +  x^/2  +  x^/6.  Choose  E's  and  R's  and  O's  and  N's  and  U's  and  S's.  In  this  the  generating 
function  for  words  is 

l  +       -  +  +  (1+x) 

,     ^      35a;2     197x^     265a;^          .     322x6  g^^r 

=  l+6x+  — —  H  h  —  h  45a;^  H  —  +  -—- 

2  6  6  9  36 

541x^     40x^     389x^°     29x"  13x^^ 

48    ^    9         288         96         288  288' 

where  the  multiplication  was  done  by  a  symbolic  manipulation  package.  Multiplying  the  coefficient 
of  x"  by  n\  gives  the  number  of  n-long  words.  Hence  the  number  of  8-long  words  is  454,440.  D 

Example  11.7  Set  partitions  We  want  to  count  the  number  of  partitions  of  a  set,  keeping 
track  of  the  size  of  the  set  that  is  being  partitioned  and  the  number  of  blocks  in  the  partition.  Since 
the  elements  of  the  set  are  distinct,  they  are  labeled  and  so  we  will  use  an  EGF  to  count  them. 
On  the  other  hand,  the  blocks  of  a  partition  are  not  labeled,  so  it  is  natural  to  use  an  ordinary 
generating  function  to  count  blocks. 

Let's  start  by  looking  at  partitions  with  one  block.  There  is  just  one  such,  so  the  EGF  is 
E„>o^"/r^!  =  e^-l- 

What  about  partitions  with  two  blocks?  Wc  can  use  the  Rule  of  Product.  In  fact,  the  statement 
of  the  Rule  of  Product  has  built  into  it  partitions  of  the  set  into  two  blocks  K  and  L.  Thus  the 
EGF  should  be  (e^  —  1)^.  This  is  not  quite  right  because  these  blocks  are  ordered  but  the  blocks  of 
a  partition  of  a  set  arc  supposed  to  be  unordered.  As  a  result,  we  must  divide  by  2!. 

You  should  have  no  trouble  showing  that  the  number  of  partitions  of  a  set  that  have  exactly  k 
blocks  has  the  EGF  (e^  -  1)*=/^;!. 

Recall  that  S{n,  k),  the  Stirling  number  of  the  second  kind,  is  the  number  of  partitions  of  an  n 
element  set  into  exactly  k  blocks.  By  the  previous  paragraph, 

^5(n,A;)x"/n!  =  ^^-^r^-  H-IS 

n 

Let  5(0,0)  =  1.  It  follows  that 

$^5(n,fc)(x"/n!)/  =  ~  '      +  S{0,0)  =  exp(2/(e- -  1)). 

n^k  k—l 

(exp(z)  is  another  notation  for  e^.)  Call  this  A{x,y). 

The  formula  for  A{x,  y)  can  be  manipulated  in  various  ways  to  obtain  recursions,  formulas  and 
other  relations  for  S{n,  k).  Of  particular  interest  is  the  total  number  of  partitions  of  a  set,  which  is 
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given  by  i3„  =  J2k  '^("■'  should  be  able  to  see  that  we  can  obtain  its  generating  function  by 

setting  y  =  1.  Thus 

^B„a;"/n!  =  A{x,l)  =  exp(e^  -  1) 

n 

=  e    exp(e^)  =  -  >  =  -  >  — 

i=0  i=0 


i=0      ra=0  n=0  ^  j=0 

Thus  we  obtain 


Theorem  11.3    Dobinski's  formula         The  number  of  partitions  of  n  is 

1    °°  An 

'to'- 


Example  11.8  Mixed  generating  functions  Suppose  we  are  keeping  track  of  both  labels  and 
unlabeled  things.  For  example,  we  might  count  set  partitions  keeping  track  of  both  the  size  of  the 
set  and  the  number  of  blocks.  We  state  without  proof 

Theorem  11.4   Mixed  Generating  Functions  If  we  are  keeping  track  of  more  that  one 

thing,  some  of  which  are  labeled  and  some  of  which  arc  unlabeled,  then  the  Rules  of  Sum 
and  Product  still  apply.  For  the  Rule  of  Product,  the  labels  must  satisfy  the  conditions  in 
Theorem  11.2  and  the  remaining  weights  must  satisfy  the  conditions  in  Theorem  10.4  (p.  292). 

Returning  to  the  set  partition  situation,  if  y  keeps 
becomes 

/^5(n,A;)a;"/n!  = 

n 

and  so  the  generating  function  is  Yl  S{n,  k)x"'y''/n\  =  exp(j/(e^  —  1)).  D 


track  of  the  number  of  blocks,  then  (11.18) 
j/^e^  -  1)^ 


The  Exponential  Formula 


Before  stating  the  Exponential  Formula,  we'll  look  at  a  special  case. 
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Example  11.9  Connected  graphs  Let  g„  be  the  number  of  simple  graphs  with  vertex  set  n 
and  let  c„  the  number  of  such  graphs  which  are  connected.  Recall  that  a  simple  graph  is  a  graph 
whose  edge  set  is  a  subset  of  'P2{V).  Since  each  pair  of  vertices  is  either  an  edge  or  not  and  there 

are  (2)  pairs  of  vertices,  it  follows  that  Qn  =  2(2).  (This  is  not  2(2);  the  (2)  is  an  exponent.)  What 

is  the  value  of  c„? 

This  problem  can  be  approached  in  various  ways.  We'll  look  at  it  in  a  way  that  can  be  generalized 
considerably  so  that  we'll  be  able  to  obtain  other  results.  The  basic  idea  is  to  view  a  graph  as  being 
built  from  connected  components.  We  must  either 

•  figure  out  how  a  graph  decomposition  into  connected  components  translates  into  a  generating 
function  or 

•  find  some  way  to  distinguish  a  component  so  we  can  split  off  one  component  and  thus  proceed 

in  a  recursive  fashion. 

We'll  follow  the  latter  approach.  In  order  to  distinguish  a  component,  we  will  root  the  graph. 

Let  G{x)  and  C{x)  be  the  EGFs  for  gn  and  c„.  It  will  be  convenient  to  take  go  =  ^  and  cq  =  0. 

Imagine  rooting  the  graph  by  choosing  some  vertex  to  be  the  root.  There  are  n  distinct  ways 
to  do  this  and  so  there  are  ngn  such  graphs.  Thus  the  generating  function  for  the  rooted  graphs  is 
xG'{x). 

We  can  construct  a  rooted  graph  by  choosing  a  rooted  component  and  then  choosing  the  rest 
of  the  graph.  This  is  just  the  set  up  for  the  Rule  of  Product.  Rooted  components  have  the  EGF 
xC'{x).  The  rest  of  the  rooted  graph  is  simply  a  graph  and  so  its  generating  function  is  G{x).  (Note 
that  this  works  even  when  the  rest  of  the  rooted  graph  is  empty  because  we  have  go  =  1.)  We  have 
proved  that 

xG'{x)  =  (xG'ix))  G{x).  11.19 

Ignoring  questions  of  convergence,  as  we  usually  do,  we  can  easily  solve  this  differential  equation  by 
separation  of  variables  to  obtain 

C{x)  =  \n{G{x))+A, 

where  the  constant  A  needs  to  be  determined.  (Here's  how  separation  of  variables  works  in  this  case. 
We  have  xdG/dx  =  {xdC/dx)G  and  so  dG/G  =  dC,  which  we  integrate.) 

Since  C(0)  =  cq  =  0  and  G{0)  =  go  =  1,  it  follows  that  A  =  0.  Thus  we  have  G(x)  =  exp(C(x)). 
It  would  be  nice  if  this  formula  led  to  a  simple  method  for  calculating  C{x).  This  is  not  the  case — in 
fact,  it  is  easier  to  equate  coefficients  of  in  (11.19)  and  rearrange  it  into  a  (rather  messy)  recursion 
for  c„.  □ 

The  formula  G  =  e'~^  has  been  generalized  considerably  in  the  research  literature.  Here  is  one 
form  of  it. 

Theorem  11.5  Exponential  Formula  Suppose  that  we  have  two  sets  S  and  C  of  labeled 
structures.  A  structure  is  rooted  by  distinguishing  one  its  labels  and  calling  it  the  root.  Form 
all  possible  rooted  structures  from  those  in  S  and  call  the  set  Sr.  Do  the  same  for  C.  If  the  Rule 
of  Product  holds  so  that         =        E^,  then 

Eg  =  exp(Ec). 

(exp(^;)  is  another  notation  for  e^.)  The  proof  is  the  same  as  that  for  G  =  e*^. 


322       Chapter  11    Generating  Function  Topics 


Example  11.10    Graphs  revisited    The  previous  example  can  be  extended  in  two  important 

directions. 

More  parameters:  We  could  include  more  variables  in  our  generating  functions  to  keep  track  of  other 
objects  besides  number  of  vertices.  The  basic  requirement  is  that  the  number  of  such  objects  in 
our  rooted  graph  must  equal  the  number  in  its  rooted  component  plus  the  number  in  the  rest  of 
the  graph  so  that  we  still  have  the  differential  equation  =  {x^)G.  (Of  course,  G  and  C  now 
contain  other  variables  besides  x  and  so  the  derivative  with  respect  to  a;  is  a  partial  derivative.)  You 
should  be  able  to  easily  see  that  we  still  have  the  solution  G  =  cxp(C),  either  from  the  differential 
equation  or  from  Theorem  11.5.  Let's  look  at  a  couple  of  examples. 

•  We  can  keep  track  of  the  number  of  components  in  a  graph.  Let  gn,k  be  the  number  of  simple 
graphs  with  V  =  n  that  have  exactly  k  components.  The  generating  function  is  G{x,y)  = 

JZ„ /,.  .9n.fe(a;"/"-0?y'°  because  the  vertices  arc  labeled  but  the  components  arc  not.  Of  course, 
a  connected  graph  has  exactly  one  component.  Thus  C{x,y)  =  C{x)y.  We  have  G{x,y)  = 
exp(C(a;)y).  Since  exp(C(a;))  =  G{x),  it  follows  that 

G{x,y)  =  G{x)y.  11.20 

What  does  this  expression  mean;  that  is,  how  should  one  interpret  G{xy? 

We  can  write  G{x)  =  1  +  xH(x)  for  some  power  series  H(x).  By  the  binomial  theorem  for 
arbitrary  powers  we  have 

'^x^H{xf.  11.21 
This  expression  makes  perfectly  good  sense:  We  have 

(y\  ^  yjij  -  1)  ■  ■  ■  (/;  -  + 1) 
\kj  k\ 

and  if  we  want  a  coefficient,  say  that  of  we  need  only  look  at  a  finite  munber  of  terms, 

namely  those  with  k  <  n.  (You  may  be  concerned  that  (11.21)  is  really  the  same  as  G(x,y)  = 
exp(2/C(a;)).  It  is.  This  can  be  shown  by  formal  power  series  manipulations.) 

It  may  appear  that  (11.20)  gives  us  something  for  nothing — just  by  knowing  the  number  of 
graphs  by  vertices  we  can  determine  the  number  of  graphs  by  both  vertices  and  components. 
Unfortunately,  we  don't  get  this  for  nothing  because  calculation  of  numerical  values  for  (11.20) 
can  involve  a  fair  bit  of  work. 

•  We  can  keep  track  of  the  number  of  edges  in  a  graph.  Let  gn,q  be  the  number  of  simple  graphs 
with  V  =  n  that  have  exactly  q  edges  and  define  Cn^q  similarly.  Since  each  of  the  (2)  elements 

of  V2iy)  may  be  an  edge  or  not,       9n,qZ'^  =  (1  +  z)^^\  Thus 

00 

G{x,z)  =  J2{xyn\)J2gn,qz''  =  +  zp  {x^ /»[} 

n>0  q>0  n=0 

and  C(x,  z)  =  ln(G'(x,  z)).  Unfortunately,  no  simple  formula  is  known  for  the  sum. 

Special  collections  of  graphs:  We  can  limit  our  attention  to  particular  subsets  of  the  set  of  all  labeled 
graphs.  How  is  this  done? 

Let  C  be  some  set  of  connected  graphs  and  let  G{C)  be  all  the  graphs  whose  connected  com- 
ponents lie  in  this  collection.  Suppose  the  collection  C  satisfies  the  condition  that  the  number  of  n 
vertex  graphs  in  the  set  C  depends  on  n  and  not  on  the  labels  of  the  vertices.  In  this  situation,  we 
can  still  derive  our  equation  G  =  e'~^  and  can  still  keep  track  of  other  objects  besides  vertices.  Let's 
look  at  some  examples. 

•  Suppose  that  the  connected  components  are  complete  graphs;  i.e.,  every  vertex  in  the  component 
is  connected  to  every  other.  The  only  thing  that  distinguishes  one  component  from  another  is 


G{x)y  =  ( 
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its  set  of  vertices.  Thus  we  can  identify  such  a  graph  with  a  partition  of  its  vertex  set  in  which 
each  block  corresponds  to  the  vertex  set  of  a  component.  This  is  a  bijection  between  this  class 
of  graphs  and  partitions  of  sets.  We  easily  have  c„  =  1  for  all  n  >  0  and  so  C{x)  =  —  1. 
Consequently,  G{x)  =  cxp(e^  —  1).  The  number  of  components  of  such  a  graph  corresponds  to 
the  number  of  blocks  in  the  partition.  Thus  G{x,  y)  =  exp((e^  —  l)y),  in  the  notation  of  (a),  is 
the  generating  function  for  the  Stirling  numbers  of  the  second  kind.  Hence  Example  11.7  is  a 
special  case  of  our  G  =  e'~^  formula. 

•  Our  next  illustration,  cycles  of  a  permutation,  will  be  a  separate  example.  Q 
Example  11.11   Permutations  and  their  cycles  Let  L  be  a  set  of  positive  integers  and  let 

Sn  be  the  number  of  permutations  of  n  such  that  all  the  cycle  lengths  are  in  L.  Some  examples  are 

(a)  L  is  all  positive  integers  so  s„  counts  all  permutations; 

(b)  i  =  {2,  3, . . .}  so  Sn  counts  derangements; 

(c)  L  =  {1,  2}  so  Sn  counts  involutions. 

We'll  obtain  a  generating  function  for  s„.  In  this  case,  the  labels  are  simply  the  integers  n  that  are 
being  permuted. 

One  approach  is  to  draw  a  directed  graph  with  vertex  set  n  and  an  edge  (i,  j)  if  the  permutation 
maps  i  to  j-  This  is  a  graph  whose  components  arc  cycles  and  the  lengths  of  the  cycles  arc  all  in  L. 
We  can  then  use  the  approach  in  the  previous  example.  This  is  left  for  you  to  do.  We'll  "forget"  the 
previous  example  and  go  directly  to  the  Exponential  Formula. 

Wc  can  construct  a  permutation,  by  choosing  its  cycles.  Let  a  structure  in  C  be  a  cycle.  The 
parts  of  the  structure  are  the  things  permuted  by  the  cycle.  Let  a  structure  in  <S  be  a  permutation. 
When  a  permutation  is  rooted  by  choosing  an  element  of  n,  it  breaks  up  into  a  rooted  cycle  and  an 
unrooted  permutation.  Thus  the  Exponential  Formula  can  be  used. 

Let  Cn  be  the  number  of  n  long  cycles  that  can  be  made  from  n.  By  the  Exponential  Formula, 

C{X)    =    Eq{x)    =    22cn  — 

n>l 

and 

S{x)  =  Esix)  =  e^(-)  =  exp(^c„^). 

n>l 

We  need  to  determine  c„.  Since  all  cycle  lengths  must  lie  in  L,  Cn  —  0  when  n  ^  L.  Suppose 
n  e  L.  To  construct  a  cycle  f:n  n,  we  specify  /(I),  /^(l)  =  /(/(I)),  /''(l),  •  •  /"^"^(l).  Since  we 
want  a  cycle  of  length  n,  /"(I)  —  1  and  the  values  1,  /(I),  /^(l),  •  ■  • ,  /"^^(l)  are  all  distinct.  Since 
these  are  the  only  conditions,  /(I),  /^(l),  •  •  • ,  can  be  any  permutation  of  2, 3, . . . ,  n.  Thus 

Cn  =  {n  —       It  follows  that 


11.22 


Let's  reexamine  the  three  examples  of  L  that  were  mentioned  above.  Let  y  stand  for  the  sum  in 
(11.22). 

(a)  When  L  is  all  positive  integers, 

CO 

y  =  ^  =  -  ln(l  -  x) 

k=l 

and  so  S{x)  =  1/(1  —  x)  =  ^n>o^"  ~  X]n>o "'■•^"/'^•-  Hence  s„  =  nl,  which  we  already 
knew — there  are  n!  permutations  of  n. 
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(b)  For  derangements,  y  equals  its  value  in  (a)  minus  x  and  so 

S{x) 


1  -x' 

which  we  obtained  in  Example  11.4  (p.  318)  by  other  methods, 
(c)  For  involutions,  y  =  x  +     /2  and  so 

k=0  '  fe=Oj=0  ^•'^ 

Collecting  the  terms  with  k  +  j  =  n,  we  obtain 

n  , 


^"  =  $I- 


^.^J!2^(n-2J)!' 
which  we  obtained  by  a  counting  argument  in  Theorem  2.2  (p.  48).  Q 

Example  11.12   Permutations  and  their  cycles  (continued)    We  can  keep  track  of  other 

information  as  well,  provided  it  snfRccs  to  look  at  each  cycle  separately.  In  an  extreme  case,  we 
could  keep  track  of  the  number  of  cycles  of  each  length.  This  requires  an  infinite  number  of  classes 
of  unlabeled  parts,  one  for  each  cycle  length.  Let  the  associated  variable  for  cycles  of  length  k  be 
Zfe.  The  resulting  generating  fimction  will  be  an  EGF  in  the  size  of  the  set  being  permuted,  but  an 
ordinary  generating  function  in  each  Zk  since  cycles  are  not  labeled.  You  should  be  able  to  show  that 
the  generating  function  is 

If  we  simply  keep  track  of  the  size  of  the  set  being  permuted  and  the  number  of  cycles  in  the 
permutation,  the  generating  function  is  just  the  G{x,y)  of  (11.20).  You  should  be  able  to  see  why 
this  is  so  .  If  not,  the  second  paragraph  of  the  previous  example  may  help.  Another  way  to  see  it 
is  to  use  (11.23)  with  Zk  =  y  for  all  k.  Let  z{n,  k)  be  the  number  of  permutations  of  n  that  have 
exactly  k  cycles.  These  are  called  the  signless  Stirling  numbers  of  the  first  kind.  We  have  just  seen 
that 

Z{x,y)  =    J2  4ri,k){x"/n\)y''  =  (y^)     =  (l-a;)"''.  11.24 

n,fc>0  ^  ^ 

Equating  coefficients  of  a;"/n!,  we  obtain 

y 

n 


^^(n,^)/   =   (-l)"n!  =  y{y  +  l)...(y  +  n-l). 


We  can  use  our  generating  function  to  study  the  expected  number  of  cycles.  The  method  for 
doing  so  was  worked  out  in  (10.21)  (p.  284).  Review  that  equation.  Now  the  random  variable  X„  is  the 
number  of  cycles  and  our  generating  function  is  Z{x,  y)  instead  of  A{x,  y)  and  it  is  exponential  in  x. 
Since  it  is  exponential,  we  should  have  a  factor  of  n\  in  both  the  numerator  and  denominator  of  (10.21) 
but,  since  it  appears  in  both,  it  cancels  out.  On  with  the  calculations!  Z{x,  1)  =  (1— a;)~^  =  J2n>o 
so  the  denominator  in  (10.21)  is  1.  We  have  Zy{x,  y)  =  ln(l  —  a;)(l  —  x)~y  and  so 

Z,{x,l)  =  ^(-ln(l-x))  =  ^Y.""' 


k>l 


k 


by  Taylor's  theorem  for  —  ln(l  —  x).  Finally  [x"]  Zy{x,  1)  =  Y^]^=i  which  you  can  work  out  by 
expanding  (11.24)  further  or  by  using  Exercise  10.1.6  (p.  274). 
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We've  just  worked  out  that  the  average  number  of  cycles  in  a  permutation  of  n  is  X]fe=i  i '  ^  result 
that  was  derived  by  other  means  in  Example  2.8  (p.  47).  Using  Riemann  sum  approximations  as  we 
did  in  deriving  (10.30)  from  (10.29),  it  follows  that  the  average  number  of  cycles  in  a  permutation 
of  n  is  In  n  +  0(1)  as  n  ^  oo. 

Let's  work  out  the  variance  using  (10.22)  (p.  285).  We  have  Zyy{x,  y)  =  (—  ln(l  —  x))'^{l  —  x)~y . 
Proceeding  as  in  the  previous  paragraph, 

[x-\Zyy{x,i)  =  j2[x']{-Hi-x)f  =  E  E  H  =  ^  H- 

k=0  fc=0"  i+j=k      •'  i+j<n  •' 


Thus 


11  /  "  1 

var(X„)  =    E  E- 


i+j<n 


2 


=  E  --  +  E(x„)-  E  --  =         E  --• 

i+j<n  ij^i^ 

i+j>n 

We've  already  done  a  lot  of  computations,  so  we'll  just  remark  without  proof  that 

11  /•l-ln(l-a;)_^_   ^  1 


V-  1  1  f  -  ln(l  -x)  ,  ^1  TT^ 
/  — :  ~    /   ^   dx  =   >    —^  =  -r- 


^11  Jn  X  ^ 

i,j<n       •'  n=l 

i+j>n 

Thus  var(X„)  ~  E(X„)  ~  Inn.  By  Chebyshev's  inequality  (C.3)  (p.  385),  it's  unlikely  that 
\X„  -  lnn|/(lnn)V2  will  be  large.  □ 

*Example  11.13  Permutations  and  their  cycles  (concluded)  You  should  review  the  discus- 
sion of  parity  of  a  permutation  in  Definition  2.4  (p.  49)  and  Theorem  2.3.  Let  L  be  a  set  of  positive 

integers.  Let  e„  (resp.  o„)  be  the  number  of  even  (resp.  odd)  permutations  of  n  all  of  whose  cycle 
lengths  are  in  L.  Let  Pn  =  ~  On-  Using  the  previous  example  and  Theorem  2.3(c),  you  should  be 
able  to  show  that  the  exponential  generating  function  for  p„  is  given  by 


P{x)  =  exp 

Let's  look  at  some  special  cases. 

If  L  be  the  set  of  all  positive  integers,  then 


E 


{-If-^x'' 


E^^V^  =  E^^V^  =  Mi+x) 

keL  k=l 

and  so  P(x)  =  1  +  a;.  In  other  words,  when  n  >  1,  there  are  as  many  even  permutations  of  n  as 
there  are  odd  and  so  e„  =  o„  =  n!/2.  We  already  knew  this  in  Theorem  2.3. 
If  L  =  {2, 3, . . .},  we  are  looking  at  derangements.  In  this  case 

keL  fe=l 

and  so  P{x)  =  (1  +  x)e^^.  With  a  little  algebra,  p„  =  (— l)"~^(n  —  1).  Thus  the  number  of  even 
and  odd  derangements  of  n  differ  by  n  —  1,  and  there  are  more  even  derangements  if  and  only  if  n 
is  odd.  This  is  a  new  result  for  us,  and  it's  not  clear  how  to  go  about  proving  it  without  generating 
functions. 

Suppose  L  consists  of  all  positive  integers  except  k  >  2.  Reasoning  as  in  the  previous  paragraph, 
P{x)  =  (1  +  x)e^^''^'',  where  the  sign  is  plus  when  k  is  odd.  If  you  expand  P{x)  in  a  power  series, 
you  should  see  that  pn  7^  0  if  and  only  if  n  is  a  multiple  of  k  or  one  more  than  a  multiple  of  k.  We 
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have  shown  that,  for  k  >  2,  among  the  pcrnnitations  of  n  with  no  fc-cyclcs,  the  numbers  of  even  and 
odd  permutations  are  equal  if  and  only  if  neither  n  nor  n  —  1  is  a  multiple  of  fc.  Q 

Example  11.14  Rooted  labeled  trees  Let's  study  t„,  the  number  of  labeled  rooted  trees 
with  V  =  n. 

If  we  remove  the  root  vertex  from  a  rooted  tree,  making  all  of  its  sons  new  root  vertices,  we 
obtain  a  graph  all  of  whose  components  are  rooted  labeled  trees.  This  process  is  reversible. 

Let  T  be  the  EGF  for  the  t„'s.  By  the  Exponential  Formula,  the  generating  function  for  graphs 
all  of  whose  components  are  rooted  labeled  trees  is  e'^ .  Call  all  such  graphs  J^.  Thus  Ejr  =  .  The 
process  we  described  in  the  previous  paragraph  constructs  a  rooted  tree  by 

•  partitioning  the  labels  into  (K,L)  with  \K\  =  1, 

•  assigning  the  label  in  K  to  the  new  root, 

•  choosing  an  element  F  e     with  labels  L,  and 

•  joining  the  roots  of  the  trees  in  F  to  the  new  root. 

By  the  Rule  of  Product  T  =  xE-p  =  xe^. 

How  can  we  obtain  f„  from  the  equation  T  =  xe^l  There  is  a  technique,  called  Lagrange 
inversion,  which  can  be  useful  in  this  situation. 

Theorem  11.6    Lagrange  Inversion  Let  T{y),  f{y)  and  g{y)  he  power  series  with 

/(O)  7^  0.  Suppose  that  T{x)  =  xf{T{x)).  Then  the  coefEcient  of  x"  in  g{T{x))  is  the  coef- 
ficient of        in  g'{u){f{u))''/n;  that  is,  [x'']g{T{x))  =  (c;'(M)/(M)"/n). 

Proofs  and  generalizations  of  Lagrange  inversion  are  discussed  at  the  end  of  this  chapter. 
In  our  particular  case,  g{u)  =  u  and  f{u)  =  e".  Thus 

t^/n\  =  [w"]e""/n  =  (n"/n!)/n. 

Thus  t„  =  n"-^ 

Incidentally,  we  choose  the  symbol  because  graphs  whose  components  are  trees  are  called 
forests.  D 


Exercises 


*11.2.1.  Let  T  be  a  collection  of  structures,  each  of  which  can  have  labeled  unlabeled  parts.  An  example  is  the 
partitions  of  an  n-set  counted  by  n  >  0  and  by  the  number  of  blocks.  Let  T{x,y)  be  the  generating 

function  for  T,  where  it  is  exponential  in  the  labeled  parts  (using  x)  and  ordinary  in  the  unlabeled 
parts  (using  y).  State  and  prove  a  Rule  of  Product  for  these  generating  functions. 

11.2.2.  Let  T  be  a  collection  of  structures,  each  of  which  has  at  least  one  labeled  part.  Let  be  the  EGF 
for  T.  with  respect  to  those  labeled  parts.  Prove  the  following  results  and  compare  them  with  those 
in  Exercise  10.4.1  (p.  298).  When  talking  about  lists  (or  sets)  of  structures  in  this  exercise,  the  labels 
that  are  used  for  the  objects  all  differ;  that  is,  structures  at  different  positions  in  a  list  (or  set)  must 
have  different  labels.  If  a  totality  of  n  objects  in  a  class  appear,  they  are  to  use  the  labels  in  n. 

/       \  k 

(a)  The  EGF  for  fe- lists  of  structures  is  (E^j  . 

(b)  The  EGF  for  lists  of  structures  is  (1  —  E'j)~^ .  Here  lists  of  any  length  axe  allowed,  including  the 
empty  list. 

(c)  The  EGF  for  sets  of  structures,  where  each  structure  must  come  from  T  is  exp(E^) . 

(d)  The  EGF  for  circular  lists  is  -  ln(l  -  E^) . 

These  results  could  be  generalized  to  allow  for  unlabeled  parts  by  using  the  result  of  Exercise  11.2.1. 
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11.2.3.  Let  z{n,  k)  be  the  signless  Stirling  numbers  of  the  first  kind,  defined  in  Example  11.12. 

(a)  Let  Zn{y)  =  X]fc  ^{''^^k)y^ .  Show  that  Zn{y)  =  Zn-i{y)  x  {y  +  n  —  1)  and  use  this  to  deduce 
the  recursion 

z{n,k)  =  z(n  —  l,k  —  1)  +  (n  —  l)z{n  —  l,k). 

(b)  Prove  the  previous  recursion  by  a  direct  counting  argument  using  the  fact  that  z{n,  k)  is  the 
number  of  permutations  of  n  with  exactly  k  cycles. 

11.2.4.  Let  a-n  be  the  rmmbcr  of  ways  to  place  ri  labeled  balls  into  boxes  immbcrcd  1,  2,  •  •  ■  so  that  the  number 
of  balls  in  the  fcth  box  is  a  multiple  of  k.  (This  is  the  labeled  analog  of  the  box-ball  interpretation 
of  partitions  of  a  number.)  The  EGF  for  a  can  be  written  as  a  product  of  sums.  Do  it. 

11.2.5.  Let  an  be  the  number  of  sequences  of  A's  B's  and  C's  such  that  any  letter  that  actually  appears  in 
the  sequence  must  appear  an  odd  number  of  times. 

-  3 

(a)  Show  that  the  EGF  for  o„  is  f  1  +  — 


2 

(b)  Show  that  for  n  odd  o„  =  (3"  +  9)/4  and  forn  >  0  and  even  o„  =  3  x  2""-^. 

11.2.6.  Let  o„jfc  be  as  in  Exercise  11.2.5,  except  that  k  different  letters  are  used  instead  of  just  A,  B  and  C. 

(a)  Obtain  a  generating  function        a„ /^x^/n!. 

(b)  Show  that  a„    equals  ^  {^^2~^ Cn,j,  where  the  sum  is  over  all  j  with  the  same  parity  as  n  and 
E„c„,XVn!  =  (e--e-T-. 

*(c)  Can  you  obtain  a  more  explicit  formula  for  ^? 

11.2.7.  Recall  that  Bn  is  the  number  of  partitions  of  the  set  n.  Let  B{x)  be  the  EGF.  Show  that 
B' (x)  =  e^B{x)  and  use  this  to  show  that 

11.2.8.  Let  ttn  be  the  number  of  partitions  of  n  such  that  each  block  has  an  odd  number  of  elements  and  let 
A(x)  be  the  EGF.  Use  the  Exponential  Formula  to  show  that  A{x)  =  exp(e^  —  e~^)/2). 

11.2.9.  Let  C{x)  and  G{x)  =  exp(C(a;))  be  the  EGFs  for  some  collection  of  connected  graphs  and  some 
collection  of  graphs,  respectively. 

(a)  Let  H{x)  =  C{x)G(x)  be  an  EGF  for  the  sequence  /;„.  Using  the  Exponential  Formula,  show 
that  the  average  number  of  components  in  the  graphs  counted  by  gn  is  hn/gn- 

(b)  Prove  the  formula  in  the  previous  part  by  a  simple  application  of  the  Rule  of  Product. 

Hint.  Remember  that  H{x)  will  be  counting  the  total  number  of  components,  not  the  average 
number. 

(c)  Deduce  the  formula  at  the  end  of  Example  11.12  for  the  average  number  of  cycles  in  a  permu- 
tation. 

(d)  Deduce  that  the  average  number  of  blocks  in  a  partition  of  n  is  —  1. 

(e)  Obtain  a  formula  like  the  previous  one  for  the  average  number  of  cycles  in  an  involution. 

11.2.10.  Let  an  be  the  number  of  partitions  of  n  where  the  order  of  the  blocks  is  important.  Obtain  the  EGF 
A{x)  and  use  it  to  show  that  an  =  X^fc>o  k'^ /2^'^^ . 

11.2.11.  A  permutation  /  of  n  is  said  to  be  alternating  if  /(I)  <  /(2)  >  /(3)  <  . . ..  Lot  an  be  the  number 
of  alternating  permutations  of  n.  It  will  be  convenient  to  set  ao  =  1.  Let  A  be  the  EGF  for  the  an's 
and  let  B  be  the  EGF  for  those  an  with  odd  n;  that  is,  h2n  =  0  and  62n+i  =  ti2n+i-  By  considering 
/(I), . . . ,  f{k  —  1)  and  f{k  +  1), . . . ,  /(n),  where  k  =  f~^{n),  show  that 

B'{x)  =  B{xf  +  1  and  A'{x)  =  B{x)A{x)  +  1. 

Verify  that  A{x)  =  tana;  +  sec  a;  and  B{x)  =  tana;. 
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11.2.12.  A  fc-ary  tree  is  a  rooted  tree  in  which  each  nonleaf  vertex  has  exactly  A;  sons.  Let  tn  be  the  number 
of  unlabeled  plane  /c-ary  trees  having  n  nonleaf  vertices. 

(a)  Prove  that  the  ordinary  generating  function  for  tn,  satisfies  the  equation  T{x)  =  1  +  xT{x)'^. 

(b)  Use  Lagrange  inversion  (Theorem  11.6)  to  show  that  tn  =  ^{n-l}- 
Hint.  Apply  the  theorem  to  S{x)  =  T(x)  —  1. 

11.2.13.  A  function  f:  A  — »  A  is  said  to  have  a  square  root  if  there  is  a  function  g:A^A  such  that 
5(5(1))  =  /(a)  for  all  a  €  A. 

(a)  Show  that  a  permutation  has  a  square  root  if  and  only  if,  it  has  an  even  number  of  cycles  of 
length  2k  for  each  A;  >  0. 

(b)  Let  Sn  be  the  number  of  permutations  of  n  which  have  a  square  root.  Show  that 

S{x)  =  Y.^nxVn\  =  eJY.T)     f[  -J^^^lM±ptllR. 

n>0  ^  fe=l         ^  fc=2 

k  odd  k  even 

Hint.  One  way  is  to  use  exp  ^y^^^  z^.x'^/k^  and  then  use  bisection  of  series  for  each  even  k 

one  by  one.  Another  approach  is  to  do  each  cycle  length  separately  and  then  use  the  Rule  of 
Product. 

(c)  Using  cosh(M)  =  (e"  +  e~")/2  and  bisection  of  the  Taylor  series  for  ln(l  —  x),  show  that 


S{x)  =  J]-^  ]Jcosh(a;2'=/2fe). 


(d)  By  taking  logarithms,  differentiating  and  then  multiplying  by  S{x)  eonchide  that  S'{x)  = 
S{x)B{x)  where  B{x)  is  (1  —  x'^)^^  plus  the  sum  of  i"^^  tanh(i-'Yn)  over  all  even  positive 
n.  How  much  work  is  involved  in  using  this  to  construct  tables  of  Sn?  Can  you  think  of  an  easier 
method? 

11.2.14.  Let 

o-n.k  be  the  number  of  permutations  of  n  having  exactly  k  cycles  of  even  length, 
(a)  Show  that 

..fe 


n.k 


(b)  Conclude  that  the  average  number  of  even  length  cycles  in  a  permutation  of  n  is 

L"/2J  ^ 

XI  2k' 
k=l 

where  the  floor  function  [xj  is  the  largest  integer  not  exceeding  x. 

(c)  Show  that  the  number  of  odd  length  cycles  minus  the  number  of  even  length  cycles  averaged 
over  all  permutations  of  n  is 


y  - 

^  k 

Ln/2_ 

and  that  this  sum  approaches  In  2  eis  n  — >  00 


k 

fc=Ln/2j+l 
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*11.2.15.  Let  i„ be  the  number  of  n-vertex  rooted  labeled  trees  with  k  leaves.  Let  L{x,  y)  —        j,  t„  jrj(a;"/n!)j/'^. 
(The  generating  function  for  leaves  is  ordinary  because  the  labels  on  the  leaves  have  already  been 
taken  into  account  when  we  counted  vertices.)  Let  T{x)  be  the  EGF  for  rooted  labeled  trees  by 
number  of  vertices. 

(a)  Show  that  L  =  xe^  —  x  +  xy. 

(b)  Let  U{x)  be  the  EGF  ^1^^^!       .  Show  that  the  average  number  of  leaves  in  an  n-vertex  rooted 

y  y=l 

labeled  tree  is  Un/n"~^. 

*(c)  Show  that  U{x)  =  x^T' {x)  +  x  and  use  this  to  show  that  the  average  number  of  leaves  in  an 
n-vertex  rooted  labeled  tree  is  n(l  —  1/n)""'^.  Conclude  that  the  probability  that  a  randomly 

chosen  vertex  is  a  leaf  approaches  1/e  as  n  oo. 

*11.2.16.  In  this  exercise,  you'll  study  the  average  height  of  vertices  in  rooted  labeled  trees.  The  height  of  a 
vertex  is  the  number  of  edges  on  the  path  from  it  to  the  root.  For  a  rooted  tree  T,  let  h{T)  be  the 
sum  of  the  heights  of  the  vertices  of  T.  Let  t„     be  the  number  of  ri-vertex  rooted  labeled  trees  T 

with  h{T)  =  k  and  let  H{x,y)  =        i^tn,k{x"' /n\)y^ ■  If  T  has  n  vertices,  the  average  height  of  a 
vertex  in  T  is  h{T)  jn.  Let  //(n)  be  the  average  of  /i(T)  jn  over  all  n-vertex  rooted  labeled  trees. 

(a)  Show  that 

and  that  n"/i(n)  is  the  coefhcient  of  a;"/n!  in  D(x)  =  '^"^^^'^•^ 

(b)  Show  that  H{x,y)  =  x  exp(^H{xy,y)Y 

(c)  Show  that 

xT'{x)T{x)      (  T{x) 


^(*)-    l-T{x)  ~\1-T{x)^ 
where  T{x)  is  the  EGF  for  rooted  labeled  tree  by  vertices, 
(d)  Use  Lagrange  inversion  with  g{u)  =  (j^)^  to  show  that 

2(n-l)!^Vfc  +  2^  -"-'=-2 


/i(n)  = 


^\  2  y(n-fc-2)!' 
fe=o 


One  can  obtain  estimates  of  ^(n)  for  large  n  from  this  formula,  but  we  will  not  do  so. 

11.2.17.   A  functional  digraph  is  a  simple  digraph  in  which  each  vertex  has  outdegree  1.  It  is  connected  if  the 
associated  graph  is  connected.  Let       be  the  set  of  connected  n-vertex  functional  digraphs  and  let 

fn  =  \Tn\. 

(a)  Define  a  function  tp  from  n—  to  digraphs  with  V  =  nas  follows: 

for  3  G  n— ,  ^p{g)  =  {V,E)  where  E  =  {{x,  g{x))  \  x  e -n} . 

Prove  that  <^  is  a  bijection  from  n—  to  the  set  of  all  functional  digraphs  with  vertex  set  n.  (See 
Example  5.9  (p.  142).) 

(b)  Show  that  a  connected  functional  digraph  consists  of  a  circular  list  of  rooted  trees,  with  each 
tree's  edges  directed  toward  the  root  and  with  the  roots  joined  in  a  cycle  (as  indicated  by  the 
circular  list).  We're  not  asking  for  a  proof,  just  a  reasonable  explanation  of  why  this  is  true. 
Drawing  some  pictures  may  help. 

(c)  Show  that  Y,n  /n^;"/"-!  =  -ln(l  -  T(x)),  where  T{x)  is  the  EGF  for  rooted  labeled  trees. 

(d)  Using  the  fact  that  T{x)  =  xe^^"^  and  Lagrange  inversion,  deduce 

n  — 1  f. 

fn  =  (--i)!E5r- 


k\ 

fe=0 
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11.2.18.    A  rooted  map  is  an  unlabeled  graph  that  has  been  embedded  in  the  plane  and  has  had  one  edge 


distinguished  by  assigning  a  direction  to  it  and  selecting  a  side  of  it.  Tutte  developed  some  clever 
techniques  for  counting  such  structures.  Using  them  it  can  be  shown  that 


and  M  is  the  ordinary  generating  function  for  run,  the  number  of  n-edge  rooted  maps.  Prove  that 


Hint.  The  notation  may  be  a  bit  confusing  for  using  Theorem  11.6.  Note  that  T{x)  is  simply  the 
function  that  is  given  implicitly,  so  in  this  case  T{x)  =  u. 


11.3   Symmetries  and  Polya's  Theorem 


In  this  section  we  will  discuss  a  generalization  of  the  Burnside  Lemma.  We  will  then  consider  an  im- 
portant special  case  of  this  generalization,  namely  Polya's  Theorem.  You  should  review  the  statement 
and  proof  of  the  Burnside  Lemma  (Theorem  4.5  (p.  112)). 

Let  5  be  a  set  with  a  permutation  group  G.  Recall  that  we  say  x,y  G  S  are  equivalent  if  y  =  g{x) 
for  some  g  S  G.  (These  equivalence  classes  arc  referred  to  as  orbits  of  G  in  5*.)  Suppose  further  that 
there  is  a  function  W  defined  on  S  such  that  W  is  constant  on  equivalence  classes.  This  means  that  if 
X  and  y  are  equivalent,  then  W{y)  =  W{x).  We  can  rephrase  'W  is  constant  on  equivalence  classes" 
as  "W{g{x))  =  W{x)  for  all  5  e  G  and  all  x  G  5." 

You  may  have  noticed  that  W  is  not  completely  specified  because  we  haven't  defined  its  range. 

We  don't  really  care  what  the  range  is  as  long  as  addition  of  range  elements  and  multiplication  of 
them  by  rationals  is  possible.  Thus  the  range  might  be  the  real  numbers,  polynomials  with  rational 
coefficients,  or  lots  of  other  things. 

Let  £  be  the  set  of  equivalence  classes  of  S  with  respect  to  the  group  G.  (Don't  confuse  £  with 
our  notation  for  exponential  generating  functions.)  If  B  G  £,  define  W{B)  =  W{y),  where  y  is  any 
element  of  B.  This  definition  makes  sense  because  W  is  constant  on  equivalence  classes. 

Theorem  11.7  The  Weighted  Burnside  Lemma    With  the  above  definitions, 


M(x-)  =  (1  -  4tt)(l  -  3u)  ^    where    x  =  u{l  -  Su) 


run  = 


2(2n)!  3" 
n\  (n  +  2)!' 


J2W{B) 


1 


E^(5) 


Be£ 


\G\ 


where  N{g)  is  the  sum  ofW{x)  over  all  x  G  S  such  that  g{x)  =  x. 


Before  reading  further,  you  should  be  able  to  see  that  the  case  W{x)  =  1  for  all  a;  £  5  is  just 
Burnside's  Lemma.  This  is  simply  a  matter  of  understanding  the  notation  we've  introduced. 
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Proof:  The  proof  of  this  result  is  a  simple  modification  of  the  proof  of  Burnside's  Lemma.  Here 
it  is.  You  should  be  able  to  supply  the  reasons  for  all  the  steps  by  referring  back  to  the  proof  of 
Burnside's  Lemma. 

=  EW^(y)  =  ^\Y.{llMy)  =  y)w{y)\ 


^^T.\Y.My)  =  yW{y)\ 


Example  11.15  The  Ferris  wheel  Let's  return  to  the  "Ferris  wheel"  problem  of  Example  1.12 
(p.  13).  Recall  that  we  want  to  look  at  circular  arrangements  of  ones  and  twos  where  the  circles 
contain  six  digits.  In  this  case,  the  group  G  contains  6  elements,  which  can  be  described  by  how 
they  rearrange  the  six  positions  on  the  Ferris  wheel.  We  called  the  group  elements  Qq  through  g^, 
where     shifts  things  circularly  i  positions. 

As  already  noted,  if  we  set  W{x)  =  1  for  all  x,  then  we  simply  end  up  counting  all  equivalence 
classes.  Suppose,  instead,  that  we  set  W{x)  =  1  if  x  contains  exactly  4  ones  and  W{x)  =  0  otherwise. 
This  is  an  acceptable  definition  of  W  because  two  equivalent  sequences  contain  the  same  number 
of  ones.  In  this  case,  we  end  up  counting  the  equivalence  classes  of  sequences  that  contain  exactly 
4  ones. 

A  more  interesting  example  is  obtained  by  letting  z  be  a  variable  and  setting  W{x)  =  ,  where 
k  is  the  number  of  ones  in  x.  In  this  case  the  coefficient  of  z''  in  W{£)  =  J^BeS  W{B)  is  the 
number  of  equivalence  classes  whose  sequences  each  contain  exactly  k  ones.  In  other  words,  W{£)  is 
a  generating  function  for  equivalence  classes  of  sequences  by  number  of  ones.  Let's  compute  W{£) 
in  this  case.  For  N{go),  any  sequence  is  allowed  since  go{x)  =  x  for  all  x.  Thus  each  position  can  be 
either  a  two  or  a  one.  By  the  Rules  of  Sum  and  Product  for  generating  functions,  N{go)  =  (1  +  z)^. 
If  gi{x)  =  X,  all  positions  must  be  the  same  and  so  N{gi)  =  1  +  z^.  U g2{x)  =  x,  the  even  numbered 
positions  have  the  same  value  and  the  odd  numbered  positions  must  have  the  same  value.  By  the 
Rules  of  Sum  and  Product  for  generating  functions,  N{g2)  =  (1  +  2;^)^.  Similarly,  N{gs)  =  (1  +  ^^)^, 
N{g4,)  =  N{g2)  and  N{g5)  =  N{gi).  Thus  we  obtain 

W{£)  =  ^(^il  +  zf  +  2il  +  z^)  +  2{l  +  z^f  +  {l  +  z^f 

=  l  +  z  +  3z^  +  Az^  +  3z^  +  z^  +  z^. 

One  does  not  need  this  machinery  to  compute  the  generating  function.  It  is  quite  simple  to  construct 
it  from  the  leaves  of  the  decision  tree  in  Figure  4.2.  Q 


11.25 


We  now  describe  a  particularly  important  case  of  the  Weighted  Burnside  Lemma.  Let  A  and  B 
be  finite  sets,  let  S  be  the  functions  B"^  and  let  G  be  a  permutation  group  on  A.  We  make  G  into 
a  permutation  group  on  B"^  by  defining  g{f)  for  every  g  G  G  and  every  /  G  B^  to  be  the  function 
given  by 

(5(/))(a)  = /(5(a))    for  all  a  e  A 

Let  be  a  function  from  i?  to  a  set  in  which  we  can  divide  by  integers,  add  and  multiply.  Among 
such  sets  are  the  set  of  real  numbers,  the  set  of  polynomials  in  any  number  of  variables,  and  the  set 
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of  power  series.  (There  will  be  a  concrete  example  soon.)  We  make  W  into  a  weight  function  on  B"^ 
by  setting 

W{f)  =  n  W{fia)). 

Why  is  W  constant  on  equivalence  classes?  Since  /  and  g{f)  are  equivalent,  the  second  line  of  g{f) 
in  two  line  form  is  simply  a  permutation  of  the  second  line  of  /;  however,  W{f)  is  simply  the  product 
of  the  weights  of  the  elements  in  its  second  line,  without  regard  to  their  order. 

Example  11.16  The  Ferris  wheel  revisited  Let's  return  to  the  previous  example.  We  can 
phrase  it  in  our  new  terminology: 

A  =  {1,2,3,4,5,6}; 
B  =  {1,2}; 

G  =  the  circular  shifts  of  1,2,3,4,5,6; 

w  =  ill). 

For  example,  if  g  =  (1,3,5)(2,4,6)  and  /  =  ( J  ^  3  4  5  6^^  ^^^^  ^(y)  =  (2  2  ?  t  i  2)  ^nd  W{f)  =  z\ 
You  should  be  able  to  verify  the  following  observations. 

(a)  An  element  of  B"^  can  be  viewed  as  a  6-long  sequence  of  ones  and  twos. 

(b)  G  permutes  these  6-long  sequences  just  as  the  G  in  the  previous  example  did. 

(c)  W{f  )  is  z  raised  to  a  power  which  equals  the  number  of  ones  in  the  sequence  /(I), . . . ,  /(6). 

These  observations  show  that  this  problem  is  the  same  as  the  previous  one.  Q 

Why  is  this  special  situation  with  S  =  B^  important?  First,  because  many  problems  that  are 
done  with  the  Weighted  Burnside  Lemma  can  be  phrased  this  way.  Second,  because  it  is  easier  to 
apply  the  lemma  in  this  particular  case.  The  method  for  applying  the  lemma  is  known  as  Polya's 
Theorem.  Before  stating  the  theorem,  we'll  look  at  a  special  case. 

Example  11.17    The  Ferris  wheel  revisited    In  order  to  compute  A/'(5')  for  the  Ferris  wheel,  we 

need  to  study  those  functions  /  such  that  g{f)  =  f.  Look  at  g  =  (1,  3,  5)(2, 4,  6)  again.  You  should  be 
able  to  see  that  (g(/))(a)  =  /(a)  for  all  a  e  A  if  and  only /(I)  =  /(3)  =  /(5)  and /(2)  =  /(4)  =  /(6). 
For  example  {g{.f)){l)  =  /(^(l))  =  /(3),  and  so  g{f)  =  f  implies  that  /(3)  =  /(I).  More  generally, 
you  should  be  able  to  see  that  for  any  permutation  g,  we  have  g{f)  =  /  if  and  only  if  /  is  constant 
on  each  of  the  cycles  of  g. 

To  compute  the  sum  of  the  weights  of  the  functions  /  with  <?(/)  =  /,  we  can  look  at  how  to 
construct  such  a  function: 

1.  First  choose  a  value  for  /  on  the  cycle  (1,  3,  5)  AND 

2.  then  choose  a  value  for  /  on  the  cycle  (2, 4, 6). 

On  the  first  cycle,  the  value  of  /  is  either  one  OR  two.  If  /  is  one.  this  part  of  /  contributes  to 
the  weight  W(f).  If  /  is  two,  this  part  of  /  contributes  1  to  the  weight  W{f).  Using  the  Rules  of 
Sum  and  Product,  we  get  that  all  /  with  g{f  )  =  f  contribute  a  total  of  {z^  +  l){z^  +  1);  that  is, 
N{g)  =  {z^  +  1)2.  □ 
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In  order  to  state  Polya's  Theorem,  we  must  define  the  cycle  index  Zq  of  a  group  G  of  permu- 
tations. Let  Ci{g)  be  the  number  of  cycles  of  g  of  length  i.  Then 

If  G  is  the  cyclic  shifts  of  the  sequence  1,  2, 3, 4,  5,  6,  you  should  be  able  to  show  that 

Zg  =  i(a;i'^  +  2a;6  +  2a;3'+a;23).  11.26 

Theorem  11.8    Polya's  Theorem    If  S  ^  B"^  and  G  and  W  are  deGned  as  above,  then 

J2  W{E)  =  Zg{xuX2,...), 

where 

xi  =  ^{wm- 

beB 

This  can  be  proved  using  the  idea  that  we  introduced  in  the  previous  example  together  with  the 
Weighted  Burnside  Lemma.  The  proof  is  left  as  an  exercise.  You  will  probably  find  it  easier  to  prove 
after  reading  some  examples. 

Example  11.18  The  Ferris  wheel  revisited  Now  we'll  apply  Polya's  Theorem  to  derive 
(11.25).  Since  B  =  {1,  2}  and  W{B)  =  we  have  xi  =  z+1,  X2  =  z"^ +  1  and  so  on.  Substituting 

these  values  into  (11.26),  wc  obtain  (11.25). 

Now  let's  consider  a  Ferris  wheel  of  arbitrary  length  n.  Our  group  consists  of  the  n  cyclic  shifts 
go,  -  ■  ■  ,5n-i;  where  gi  shifts  the  sequence  circularly  i  positions.  This  group  is  known  as  the  cyclic 
group  of  order  n  and  is  usually  denoted  C„.  In  order  to  apply  Polya's  Theorem,  we  need  to  compute 
Zc„,  which  we  now  do. 

The  element  gi  shifts  something  in  position  p  to  position  p  +  i,  then  to  p  +  2i  and  so  on,  where 
all  these  values  are  reduced  modulo  n,  which  means  we  divide  them  by  n  and  keep  the  remainder. 
For  example,  if  n  =  6,  i  =  4  and  p  =  3,  the  successive  values  of  the  positions  are  3,  1  (the  remainder 
of  7/6),  5  and  back  to  3.  You  should  be  able  to  see  easily  that  the  length  of  the  cycle  containing  p 
depends  only  on  n  and  gi  and  not  on  the  choice  of  p.  Thus  all  cycles  of  gi  have  the  same  length. 
What  is  that  length? 

Suppose  we  return  to  position  p  after  k  steps.  This  can  happen  if  and  only  if  dividing  p  +  ki  hy 
n  gives  a  remainder  of  p.  In  other  words,  ki  must  be  a  multiple  of  n.  Since  ki  is  also  a  multiple  of  i, 
it  must  be  a  multiple  of  the  least  common  multiple  of  i  and  n,  which  is  written  lcm(i,n).  Thus,  the 
smallest  possible  value  of  ki  is  lcm(i,n).  It  follows  that 

Icmf'i  Til 

the  length  of  each  cycle  of  gi  is  ^ — .  11.27 

I 

Since  the  cycles  must  contain  n  items  between  all  of  them,  the  number  of  cycles  is 

n  ni 


lcm{i,n)/i  lcm(i,n) 


=  gcd(i,n). 


where  the  last  equality  is  a  fairly  easy  number  theory  result  (which  is  left  as  an  exercise).  Incidentally, 
this  number  theory  result  enables  us  to  rewrite  (11.27)  as 

Til 

the  length  of  each  cycle  of  g,  is  — --r. — r. 

gcd(i,  n) 
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It  follows  from  all  this  discussion,  that  gi  contributes  the  term  where  k  =  gcd(n,  i),  to  the 

sum  for  the  cycle  index  of  C„.  Thus 

^  n—l 

=    -E(^"/gcd(n,i))^^''^"''^-  11-28 

With  n  =  6,  (11.28)  gives  us  (11.26),  as  it  should.  Carry  out  the  calculation.  Notice  that  some 
of  the  terms  were  equal  so  we  were  able  to  collect  them  together.  It  would  be  nice  to  do  that  in 
general.  This  means  we  need  to  determine  when  various  values  of  gcd(n,  i)  occur.  We  leave  this  as 
an  exercise.  Q 

Example  11.19  Unlabeled  rooted  trees  Although  wo  have  studied  the  number  of  various 
types  of  unlabeled  RP-trees,  we  haven't  counted  those  that  are  not  in  the  plane.  There's  a  good 
reason  for  that — we  need  Polya's  theorem  or  something  similar. 

We'll  look  at  unlabeled  rooted  trees  where  each  vertex  has  at  most  three  edges  directed  away 
from  the  root.  Let  t„  be  the  number  with  n  vertices.  We  want  to  study  T{x),  the  ordinary  generating 
function  for  the  sequence  to,ti,  —  There  are  various  reasons  for  doing  this.  First,  it  is  not  too  difficult 
and  not  too  easy.  Second,  these  trees  correspond  to  an  important  class  of  organic  compounds:  Each 
of  the  vertices  corresponds  to  a  carbon  atom.  These  carbon  atoms  are  all  joined  by  single  bonds. 
A  "radical"  (for  example,  -OH  or  -COOH)  is  attached  by  a  single  bond  to  the  carbon  atom  that 
corresponds  to  the  root  Finally,  to  give  each  carbon  atom  valency  four,  hydrogen  atoms  are  attached 
as  needed.  Compounds  like  this  with  the  -OH  radical  are  known  as  alcohols.  Two  alcohols  with  the 
same  number  of  carbon  atoms  but  different  associated  trees  are  called  isomers.  The  two  isomers  of 
propyl  alcohol  are 

H   OH   H  H    H  H 

II  III 
H-C  — C  — C-H  H-C  — C  — C-OH 

II  III 
H    H    H  H    H  H 

The  corresponding  rooted  trees  are  o— •— o  and  o— o— •,  respectively,  where  •  indicates  a  root  and  o  a 
nonroot. 

We  can  approach  this  problem  the  same  way  we  did  RP-trees:  We  take  a  collection  of  unlabeled 
rooted  trees  and  join  them  to  a  new  root.  There  are  two  important  differences  from  our  previous 
considerations  of  RP-trees. 

•  Since  a  vertex  has  at  most  three  sons,  we  must  take  this  into  account.  (Previously  we  dealt 

mostly  with  exactly  two  sons.) 

•  There  is  no  ordering  among  the  sons.  This  is  what  we  mean  by  not  being  in  the  plane — the  sons 
are  not  ordered  from  left  to  right;  they  are  simply  a  multiset  of  trees.  In  terms  of  symmetries, 
this  means  that  all  permutations  of  the  sons  give  equivalent  trees. 

Let's  begin  with  the  problem  of  at  most  three  sons.  One  way  to  handle  this  is  to  sum  up  the 
cases  of  exactly  one,  two  and  three  sons.  There  is  an  easier  way.  We  will  allow  the  empty  tree,  so 
=  1.  By  taking  three  trees,  we  get  at  most  three  sons  since  any  or  all  of  them  could  be  the  empty 
tree. 

Polya's  Theorem  can  be  applied  in  this  situation.  In  the  notation  of  the  theorem,  A  =  3  and  B 
is  the  set  of  rooted  trees  that  we  are  counting.  A  function  in  B"^  selects  a  list  of  three  trees  to  be 
the  sons  of  the  root.  Since  all  permutations  of  these  sons  are  possible,  we  want  to  study  the  group 
of  all  possible  permutations  on  three  things.  This  group  is  known  as  the  symmetric  group  on  three 
things  and  is  denoted  ^3.  More  generally,  one  can  speak  of  5„.  Since  all  permutations  of  n  things 
are  allowed,  Sn  contains  n!  elcinicints.  the;  number  of  permutations  of  an  r7-sct. 

By  writing  down  all  six  permutations  of  3,  you  should  be  able  to  easily  show  that 

_   xl  +  3xiX2  +  2X3 
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We  need  to  compute  Xi  so  that  we  can  apply  Polya's  Theorem.  As  noted  earlier,  B  is  the  set  of 
all  unlabeled  rooted  trees  of  the  sort  we  are  constructing.  W{b)  for  a  tree  b  is  simply  x'',  where  k  is 
the  number  of  vertices  of  b.  It  follows  that  Xi  =T{x^).  Thus  we  have 

nx)  =  1  +  ,n^)-  +  3T(.)T(.^)  +  2T(.3)^ 

where  the  is  present  because  a  (possibly  empty)  tree  is  constructed  by  taking  the  empty  set 

OR  .... 

Equation  (11.29)  can  be  used  to  compute  t„  recursively.  You  should  be  able  to  do  this.  If  you  do 
this  for  n  up  to  five  or  so,  you  will  discover  that  it  is  probably  easier  to  simply  list  the  trees  and  then 
count  them.  On  the  other  hand,  suppose  you  want  the  answer  up  to  n  =  20.  You  will  probably  want 
to  use  a  computer.  While  it  is  certainly  possible  to  write  a  program  to  list  the  trees,  it  is  probably 
easier  to  use  a  symbolic  manipulation  package.  Simply  start  with  T[x)  =  1,  which  is  obviously  the 
beginning  of  the  generating  function.  Then  apply  (11.29)  n  times  to  compute  new  values  of  T{x). 
To  avoid  overflow  and/or  excessive  running  time,  you  should  truncate  all  calculations  to  terms  of 
degree  at  most  n. 

There  is  another  situation  in  which  (11.29)  is  better  than  listing.  Suppose  that  we  want  to 
get  an  estimate  of,  say  tioo-  Asymptotic  methods  provide  a  way  for  doing  this  using  (11.29).  See 
Example  11.34  (p.  354).  □ 

In  general,  computing  the  cycle  index  of  a  group  is  not  a  simple  matter.  The  examples  considered 
so  far  have  involved  relatively  simple  groups.  The  following  example  deals  with  a  somewhat  more 
complicated  situation — the  cycle  index  of  the  group  of  symmetries  of  a  cube,  where  the  group  is  a 
permutation  group  on  the  faces  of  the  cube. 

Example  11.20  Symmetries  of  the  cube  In  Exercise  4.2.7  (p.  110),  you  used  a  decision  tree 
to  study  the  ways  to  color  the  faces  of  a  cube.  Here  we'll  use  a  cycle  index  polynomial  to  study  it. 

Before  doing  this,  we  must  answer  two  questions: 

•  What  symmetries  are  possible?  Certainly,  we  should  allow  the  rotations  of  the  cube.  There  are 
other  symmetries  that  involve  reflections.  Whether  we  allow  these  or  not  will  depend  on  the 
problem  at  hand.  A  solid  cube  in  the  real  world  can  only  be  rotated;  however,  reflections  of  a 
cube  that  is  associated  with  a  chemical  compound  (like  the  trees  in  the  previous  example)  may 
be  a  perfectly  acceptable  physical  manipulation.  We'll  begin  with  just  rotations  and  then  allow 
reflections  as  well. 

•  What  objects  are  being  permuted?  Obvious  choices  are  the  vertices,  edges  and  faces  of  the  cube. 
There  are  less  obvious  ones  such  as  diagonals.  Different  choices  for  the  objects  will,  in  general, 
lead  to  different  permutation  groups  and  hence  different  cycle  index  polynomials.  Since  we  are 
coloring  faces,  we'll  choose  the  faces  of  the  cube  and  leave  other  objects  as  exercises. 

Before  proceeding,  we  recommend  that  you  flnd  a  cube  if  you  can.  Possibilities  include  a  sugar  cube, 
a  die  and  a  homemade  cube  consisting  of  six  squares  of  cardboard  taped  together.  Here  is  a  picture 
of  an  "unfolded"  cube. 


1 

2 

3 

4 

5 

6 

11.30 


If  you  imagine  the  square  marked  3  as  being  the  base,  squares  1,  2,  4  and  6  fold  up  to  produce  sides 
and  square  5  folds  over  to  become  the  top. 

What  axes  can  the  cube  be  rotated  around  and  through  what  angles  so  that  it  will  occupy  the 
same  space?  The  axes  fall  into  three  classes: 

•  Face  centered:  This  type  goes  through  the  centers  of  opposite  pairs  of  faces.  There  are  three  of 
them,  through  the  faces  1-6,  2-4  and  3-5.  A  cube  can  be  rotated  0°,  ±90°  or  180°  about  such 
an  axis. 
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•  Edge  centered:  This  type  goes  through  the  centers  of  opposite  pairs  of  edges.  There  are  six  of 
them.  An  edge  can  be  described  by  the  two  faces  that  he  on  either  side  of  it.  In  this  notation, 
one  edge  centered  axis  is  25-34.  A  cube  can  be  rotated  0°  or  180°  about  such  an  axis. 

•  Vertex  centered:  This  type  goes  through  opposite  vertices  of  the  cube.  There  are  four  of  them, 
one  of  which  is  125-346,  where  a  vertex  is  described  by  the  three  faces  that  meet  there.  A  cube 
can  be  rotated  0°  or  ±120°  about  such  an  axis. 

To  compute  a  term  in  the  cycle  index  polynomial  corresponding  to  a  rotation,  we  can  look  at  how 
the  rotation  permutes  the  faces  and  then  determine  the  term  from  the  cycle  lengths.  For  example, 
the  face  centered  rotation  about  1-6  through  90°  gives  the  permutation  (1)(2,  3, 4,  5)(6).  This  gives 
us  the  term  xix^xi  =  x\xa.  You  should  be  able  to  establish  the  following  terms  by  studying  your 
cube.  The  letters  F,  E  and  V  describe  the  type  of  axis. 


no  rotation 

±90°  F 

180°  F 

2  2 
X1X2 

180°  E 

X2 

±120°  V 

■^3 

Altogether  there  are  24  rotations.  (We  have  counted  the  rotations  through  0°  just  once.) 

Before  we  add  up  the  24  terms,  we  must  verify  that  the  rotations  are  all  distinct.  One  way 
to  do  this  is  by  looking  at  the  cycles  of  the  24  permutations  of  the  faces  and  noting  that  they  are 
distinct.  Can  you  think  of  another  method  for  seeing  that  they  are  distinct?  You  might  try  a  different 
approach — instead  of  showing  that  the  rotations  are  distinct  directly,  show  that  there  must  be  24 
distinct  rotations  by  a  geometric  argument  and  then  use  the  fact  that  we  have  found  all  possible 
rotations. 

Adding  up  the  24  terms,  the  cycle  index  polynomial  for  the  rotations  of  the  cube  in  terms  of 
the  faces  of  the  cube  is 

xf  +  6x1x4  +  Sxjxl  +  6x1  + 


24 


11.31 


It  follows  that  the  number  of  rotationally  inequivalent  ways  to  color  the  faces  of  a  cube  using  k 
colors  is 

_       +  6k^  +  3k'^  +  6k^  +  8k^  _  A;^  +  3^"  ±  12A;3  +  8P 

-  ^  -  ^  •  11-32 

How  many  rotationally  inequivalent  ways  can  the  cube  be  colored  with  3  colors  so  that  every  color 
appears?  We  cannot  substitute  directly  into  the  cycle  index  polynomial,  but  we  can  see  the  answer 
with  a  little  thought.  Can  you  do  it? 

*       *       *       Stop  and  think  about  this!        *        *  * 

One  solution  is  to  use  the  Principle  of  Inclusion  and  Exclusion  together  with  (11.32).  The  answer  is 

C(3)-3C(2)  +  3C(1)-C(0)  =  57-  3  x  10  +  3  x  1  -  0  =  30. 

How  many  rotationally  inequivalent  ways  can  we  color  the  faces  of  a  cube  with  k  colors  so  that  ad- 
jacent faces  have  distinct  colors?  This  problem  cannot  be  answered  with  Polya's  Theorem;  however, 
it  can  be  done  with  Burnside's  Lemma. 

We  now  turn  our  attention  to  rotations  and  reflections  of  the  cube.  Imagine  the  cube  made  from 
(11.30)  has  been  rotated  in  any  fashion.  Now  carry  out  the  following  operations. 

•  Rotate  it  so  that  face  3  is  on  the  bottom  and  face  2  is  on  the  left. 

•  Open  the  cube  by  cutting  two  sides  of  face  4  and  all  sides  of  face  5  except  the  one  between  it 
and  face  4. 
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This  will  always  lead  to  the  picture  in  (11.30).  Now  suppose  you  do  the  same  thing  with  a  cube  that 
has  been  rotated  and,  possibly,  reflected.  The  result  will  be  either  (11.30)  or  (11.30)  with  1  and  6 
interchanged.  You  should  convince  yourself  of  this  by  experimentation  or  by  geometric  arguments. 

The  result  of  the  previous  paragraph  implies  that  any  reflection  and  rotation  combination  can 
be  obtained  as  either  a  rotation  or  a  rotation  followed  by  an  interchange  of  the  labels  1  and  6.  Using 
this  observation,  the  cycle  index  of  the  group  of  rotations  and  reflections  can  be  computed  from  a 
knowledge  of  the  orbits  of  the  rotations.  We  will  not  carry  out  the  tedious  details.  The  result  is 

+  3xfx2  +  &x\xi  +  9xf     +  7x1  +  &X2X4,  +  8a;|  +  8a;6 

48 

There  are  alternative  geometric  arguments  that  could  be  used  to  obtain  this  result.  For  example, 
one  can  look  at  the  possible  arrangement  of  faces  around  the  upper  left  front  corner  of  the  cube.  Q 

*Exannple  11.21  Counting  unlabeled  graphs  How  many  unlabeled  graphs  are  there  with  n 
vertices  and  q  edges?  This  has  many  variants:  do  we  allow  multiple  edges?  loops?  Are  the  graphs 
directed? 

All  of  these  can  be  done  in  a  similar  manner  and  all  lead  to  messy  expressions.  We'll  choose  the 
simplest  case:  simple  directed  graphs  with  loops  allowed. 

In  any  of  these  situations,  we  use  Polya's  Theorem.  Suppose  that  n,  the  number  of  vertices,  is 
given.  Let  n  be  the  set  of  vertices.  The  functions  we  consider  will  be  from  n  x  n  to  {0, 1}.  The  value 
of  f{{u,v))  is  the  number  of  edges  from  u  to  v.  In  the  notation  of  Polya's  Theorem,  S  =  and  G 
acts  on  A.  We  have  already  said  that  B  =  {0, 1}  and  A  =  nx  n.  What  is  G  and  what  is  the  weight 
W?  The  group  G  will  be  the  group  of  all  permutations  of  n  things,  but  instead  of  acting  on  the 
vertices  n,  it  must  act  on  the  ordered  pairs  nxn.  Most  of  this  example  will  be  devoted  to  explaining 
and  computing  the  cycle  index  of  this  group  action.  W  is  given  by  W{i)  =  y^.  The  coefficient  of 
will  then  be  the  number  of  unlabeled  simple  digraphs  with  n  vertices  and  q  edges. 

Before  turning  to  the  calculations,  we  remark  how  some  of  the  other  graph  counting  problems 
can  be  dealt  with.  If  loops  are  not  allowed,  remove  the  n  ordered  pairs  of  the  form  from  A.  If 
any  number  of  edges  is  allowed,  replace  B  by  the  nonnegative  integers,  still  setting  W{i)  =  y*.  To 
count  (looplcss)  graphs,  replace  nxn  with  V2{n),  the  set  of  2  element  subsets  of  n. 

Let  5  be  a  permutation  acting  on  n.  If  we  write  g  in  cycle  form,  it  is  fairly  easy  to  translate  g 
into  a  permutation  of  n  x  n.  For  example,  suppose  that  n  =  3  and  g  =  {1, 2)  (3).  To  avoid  confusing 
ordered  pairs  with  cycles,  we  will  indicate  an  ordered  pair  without  parentheses  and  commas,  e.g., 
13  instead  of  (1,  3).  Using  g,  we  have  the  following  two  line  form  for  the  corresponding  permutation 
of  3x3 

/II    12    13   21    22    23   31    32   33  \ 
\22    21    23    12    11    13   32   31  33/' 

which  you  should  be  able  to  verify  easily.  In  cycle  form  this  is 

(11, 22)(12, 21)(13, 23)(31, 32)(33), 

which  contributes  X1X2  to  the  cycle  index.  How  do  we  do  this  in  general? 

Suppose  u,v  G  n  are  two  vertices,  that  u  belongs  to  a  cycle  of  g  of  length  i  and  that  v  belongs 
to  a  cycle  of  length  j.  The  length  of  the  cycle  containing  uv  is  the  number  of  times  we  must  apply 
g  in  order  to  return  to  uv.  After  we  apply  g  to  uv  k  times,  we  will  have  advanced  k  positions  in  the 
cycle  containing  u  and  k  positions  in  the  cycle  containing  v.  Thus,  we  will  return  to  uv  after  k  times 
if  and  only  if  A;  is  a  multiple  of  i  and  fc  is  a  multiple  of  j.  The  smallest  positive  such  k  is  lcm(i,  j), 
the  least  common  multiple  of  i  and  j.  Thus  uv  belongs  to  a  cycle  of  length  1cm (i,j)  and  this  cycle 
contributes  a  factor  of  xicm{i,j)  to  a  term  of  the  cycle  index. 

Let's  look  more  closely  at  the  set  of  directed  edges  st  that  can  be  formed  by  choosing  s  from 
the  same  cycle  as  u  and  t  from  the  same  cycle  as  y.  There  are  ij  such  edges.  Like  uv,  each  edge  st 
lies  in  a  cycle  of  length  \cm{i,j).  Thus,  there  are  ij/ lcm{i,j)  =  gcd{i,j)  such  cycles. 
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Let's  look  carefully  at  what  we  have  shown.  If  we  choose  a  cycle  C  of  length  i  and  another  cycle 
D  of  length  j,  then  the  ij  ordered  pairs  in  C  x  D  lie  in  gcd(z,  j)  cycles  of  length  lcm(i,  j).  Thus  they 
contribute  a  factor  of  xf^'^^\''-''\  to  a  term  of  the  cycle  index. 

If  g  acting  on  n  has  exactly  Vk  cycles  of  length  k,  the  argument  in  the  previous  paragraph  shows 
that  it  contributes  the  term  ^  ^ 

i  3 

to  the  cycle  index  we  are  computing.  This  gives  us  the  following  recipe  for  computing  the  cycle 
index. 

Theorem  11.9  To  compute  the  cycle  index  for  the  n-vertex  unlabeled  digraphs  (with 
loops  allowed),  start  with  the  cycle  index  for  the  set  of  all  n\  permutations  ofn.  Replace  every 
term 

n 

X^*      With     C^  '^^[XicmiiJ)} 

where  the  latter  product  extends  over  all  D 


Exercises 


11.3.1.  This  deals  with  the  cycle  index  of  Cn,  the  cyclic  group.  You  will  need  to  know  that,  up  to  the  order 
of  the  factors,  every  number  can  be  factored  uniquely  as  a  product  of  primes.  We  take  that  as  given. 

(a)  Suppose  that  A  =  p^^  •  •  •  p^*"  where Pi,  ■  ■  ■  ,Pk  are  distinct  primes.  We  use  the  shorthand  notation 
^  =  p**  for  this  product.  Suppose  that  B  = -p" .  Let  Cj  =  min(oi,6j),  di  =  max(ai,6j),  C  =  p*^ 
and  D  =  p''.  Prove  that  AB  =  CD,  C  =  gcd{A,B)  and  D  =  lcm{A,B).  This  establishes  the 
claims  in  Example  11.18. 

(b)  Prove  that  the  number  of  integers  i  in  {1, . . .  ,n}  for  which  gcd(n, i)  =  fe  is 

•  zero  if  k  does  not  divide  n; 

•  the  number  of  integers  j  in  {1, . . .  for  which  gcd(j,  n/fc)  =  1  if  fe  divides  n.  This  latter 
number  is  denoted  (p{n/k)  and  is  called  the  Euler  phi  function.  We  discussed  how  to  compute 
it  in  Exercise  4.1.5  (p.  100). 

(c)  Conclude  that 


where  the  sum  ranges  over  all  integers  between  1  and  n  inclusive  that  divide  n. 


11.3.2.  The  following  questions  refer  to  the  group  of  rotations  of  the  cube. 

(a)  Compute  the  cycle  index  of  the  group  acting  on  the  edges  of  the  cube. 

(b)  Compute  the  cycle  index  of  the  group  acting  on  the  vertices  of  the  cube. 

(c)  Imagine  three  perpendicular  axes  drawn  through  the  center  of  the  cube  joining  the  centers  of 
faces.  Label  the  axes  x,  y  and  z,  but  do  not  distinguish  a  direction  on  the  axes — thus  the  axes 
arc  simply  lines.  Compute  the  cycle  index  of  the  group  acting  on  these  axes. 

(d)  Repeat  the  previous  question  where  a  direction  is  assigned  to  each  of  the  axes.  Reversal  of  the 
direction  of  an  axis  is  indicated  by  a  minus  sign;  e.g.,  a  rotation  that  reverses  the  z-axis  and 
interchanges  the  x-axis  and  j/-axis  is  written  in  cycle  notation  as 


{x,y){-x,-y){z,-z). 
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11.3.3.  The  regular  octahedron  consists  of  eight  equilateral  triangles  joined  together  so  that  the  result  looks 
like  two  pyramids  joined  base  to  base.  A  regular  octahedron  can  be  obtained  by  placing  a  vertex  in 
each  face  of  a  cube  and  joining  two  vertices  if  they  lie  in  faces  which  are  separated  by  an  edge. 

(a)  There  is  a  duality  between  a  regular  octahedron  and  a  cube  in  which  faces  correspond  to  vertices, 
edges  to  edges  and  vertices  to  faces.  Obtain  this  correspondence. 

(b)  Write  down  the  cycle  index  for  the  group  of  symmetries  of  the  regular  octahedron  (reflections 

allowed  or  not)  acting  on  the  vertices  of  the  regular  octahedron. 
Hint.  This  requires  no  calculation  on  your  part. 

(c)  Do  the  same  for  the  rotations  of  the  octahedron  acting  on  the  edges. 

11.3.4.  Write  down  the  cycle  index  for  the  group  of  rotations  of  the  regular  tetrahedron  acting  simultaneously 
on  the  vertices,  edges  and  faces.  Instead  of  the  usual  Xi,  use  Wj,  and  /j,  indicating  whether  it  is  an 
orbit  of  vertices,  faces  or  edges.  For  example,  the  identity  rotation  gives  the  term  Viefff.  Explain 
how  to  use  this  result  to  obtain  the  cycle  index  for  the  group  acting  on  just  the  edges. 

11.3.5.  Write  down  the  cycle  index  polynomial  for  all  permutations  of  4  and  use  this  to  write  down 
the  ordinary  generating  function  D4,{y)  for  simple  unlabeled  4- vertex  digraphs  by  number  of 
edges. 

11.3.6.  Repeat  the  previous  exercise  with  "4"  replaced  by  "5."  You  may  find  Exercises  2.2.5  and  2.3.3  (p.  57) 
helpful. 

*11.3.7.  State  and  prove  a  theorem  like  Theorem  11.9  for  unlabeled  n-vertex  simple  (loopless)  graphs. 

Hint.  You  will  need  to  distinguish  two  cases  depending  on  whether  or  not  u  and  v  are  in  the  same 
cycle  of  g  acting  on  n. 


The  area  of  asymptotics  deals  with  obtaining  estimates  for  functions  for  large  values  of  the  variables 
and,  sometimes,  for  values  near  zero.  Since  the  domain  of  the  functions  we're  concerned  with  is  the 

positive  integers,  these  functions  can  be  thought  of  as  sequences  ai,a2,        Since  this  section  uses 

the  terminology  introduced  in  Appendix  B,  you  may  want  to  review  it  at  this  time. 

A  solid  mathematical  treatment  of  asymptotics  requires  more  background  than  we  are  willing 
to  assume  and  developing  the  background  would  take  too  much  time.  Therefore,  the  material  in 
this  section  is  not  rigorous.  Instead,  we  present  several  principles  which  indicate  what  the  result  will 
almost  certainly  be  in  common  combinatorial  situations.  The  intent  of  this  section  is  to  give  you  a 
feeling  for  the  subject,  some  direction  for  future  study  and  some  useful  rules  of  thumb. 

Before  launching  into  specific  tools  and  examples,  we'd  like  to  set  the  stage  a  bit  since  you  are 
probably  unfamiliar  with  asymptotic  estimates.  The  lack  of  specific  examples  may  make  some  of 
this  introductory  material  a  bit  vague,  so  you  may  want  to  reread  it  after  completing  the  various 
subsections. 

Suppose  we  are  interested  in  a  sequence  of  numbers.  We  have  four  methods  of  providing  asymp- 
totic information  about  the  numbers.  Here  they  are,  with  examples: 

•  A  combinatorial  description:  say  B„  is  the  number  of  partitions  of  an  n-set; 

•  A  recursion:  Fq  =  1,  Fi  =2  and  Fn  =  fn-i  +         for  n  >  2; 

•  A  formula:  the  number  of  involutions  of  an  n-set  is 
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j=0 


the  number  of  unlabeled  full  binary  RP-trees  with  n  leaves  is  ^  ( 
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•  A  generating  function:  the  ordinary  generating  function  for  the  number  of  comparisons  needed 
to  Quicksort  an  n  long  hst  is 

-21n(l-a;)-2a; 
(1  -  xy  ' 

the  ordinary  generating  function  for  the  number  of  unlabeled  rooted  full  binary  trees  by  number 
of  leaves  satisfies 

Given  such  information,  can  we  obtain  some  information  about  the  size  of  the  terms  in  the  sequence? 
The  answer  will,  of  course,  depend  on  the  information  we  are  given.  Here  is  a  quick  run  down  on 
the  answers. 

•  A  combinatorial  description:  It  is  usually  difficult,  if  not  impossible,  to  obtain  information  directly 
from  such  a  description. 

•  A  recursion:   It  is  often  possible  to  obtain  some  information.  We  will  briefly  discuss  a  simple 

case. 

•  A  formula:  The  formula  by  itself  may  be  explicit  enough.  If  it  is  not,  using  Stirling's  formula  may 
suSice.  If  a  summation  is  present,  it  can  probably  be  estimated  if  all  its  terms  are  nonnegative, 

but  it  may  be  (li£fic;ult  or  impossible  to  estimate  a  sum  whose;  tcirms  alternate  in  sign.  Unfortu- 
nately, the  estimation  procedures  usually  involve  a  fair  bit  of  messy  calculation  and  estimation. 
We  will  discuss  two  common  types  of  sums. 

•  A  generating  function:    If  the  generating  function  converges  for  some  values  of  x  other  than 

X  =  0,  it  is  quite  likely  that  estimates  for  the  coefficients  can  be  obtained  by  using  tools  from 
analysis.  Tools  have  been  developed  that  can  be  applied  fairly  easily  to  some  common  situations, 
but  rigorous  application  requires  a  background  in  complex  analysis.  The  main  emphasis  of  this 
section  is  the  discussion  of  some  simple  tools  for  generating  functions. 

You  may  have  noticed  that  we  have  discussed  only  singly  indexed  sequences  in  our  examples. 
There  are  fewer  tools  available  for  multiply  indexed  sequences  and  they  are  generally  harder  to 
describe  and  to  use.  Therefore,  we  limit  our  attention  to  singly  indexed  sequences. 

There  is  no  single  right  answer  to  the  problem  of  this  section — ^finding  simple  approximations 
to  some  a„  for  large  n — we  must  first  ask  how  simple  and  how  accurate. 

We  will  not  try  to  specify  what  constitutes  a  simple  expression;  however,  you  should  have 
some  feel  for  it.  For  example,  an  expression  of  the  form  an^n'^,  where  a,  b  and  c  are  constants,  is 
simple.  The  expression  -\/27m(n/e)"  is  simpler  than  the  expression  n!  even  though  the  latter  is  more 
easily  written  down.  Why?  If  we  limit  ourselves  to  the  basic  operations  of  addition,  subtraction 
multiplication,  division  and  exponentiation,  then  the  former  expression  requires  only  six  operations 
while  n!  requires  n  —  1  multiplications.  We  have  hidden  the  work  of  multiplication  by  the  use  of  a 
function,  namely  the  factorial  function.  We  can  estimate  simplicity  by  counting  the  number  of  basic 
operations. 

There  are  wide  variations  in  the  degree  of  accuracy  that  we  might  ask  for.  Generally  speaking, 
we  would  like  an  approximating  expression  whose  relative  error  goes  to  zero.  In  other  words,  given 
a„,  we  would  like  to  find  a  simple  expression  f(n)  such  that  a„//(n)  ^  1  as  n  — >  oo.  In  this  case 
we  say  that  a„  is  asymptotic  to  f{n)  and  write  a„  ~  /(n).  Sometimes  greater  accuracy  is  desired — a 
problem  we  will  not  deal  with.  Sometimes  we  may  have  to  settle  for  less  accuracy — a  situation  we 
will  be  faced  with. 

The  discussion  of  accuracy  in  the  previous  paragraph  is  a  bit  deceptive.  What  does  o„  ~  /(n) 

tell  us  about  specific  values?  Nothing!  It  says  that  eventually  On/ f{n)  gets  as  close  to  1  as  we  may 
desire,  but  eventually  can  be  a  very  long  time.  But,  in  most  cases  of  interest,  we  are  lucky  enough 
that  the  ratio  an/f{n)  approaches  1  fairly  quickly.  Can  we  be  more  precise? 
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It  is  possible  in  most  cases  to  compute  an  upper  bound  on  how  slowly  a„//(n)  approaches  1; 
that  is  upper  bounds  on  |a„//(n)  —  1|  as  a  function  of  n.  Obtaining  such  bounds  often  involves 
a  considerable  amount  of  work  and  is  beyond  the  scope  of  this  text.  Even  if  one  has  a  bound  it 
may  be  unduly  pessimistic  the  ratio  may  approach  one  much  faster  than  the  bound  says.  A  more 
pragmatic  approach  is  to  compute  a„//(n)  for  small  values  of  n  and  hope  that  the  trend  continues 
for  larger  values.  Although  far  from  rigorous,  this  pragmatic  approach  almost  always  works  well  in 
practice.  We'll  carry  out  such  calculations  for  the  problems  studied  in  this  section. 

The  following  subsections  are  independent  of  each  other.  If  you  are  only  going  to  read  one  of 
them,  we  recommend  the  one  on  generating  functions. 

Recursions 


We  have  been  able  to  solve  the  simplest  sorts  of  recursions  in  earlier  sections.  Now  our  interest 
is  different — we  want  asymptotic  information  from  the  recursions.  We  will  consider  two  types  of 
linear  recursions  that  arise  frequently  in  the  analysis  of  algorithms.  A  linear  recursion  for  a„  is  an 
expression  of  the  form 

a„  =  ci(n)a„_i  +  C2(n)o„_2  +  . . .  +  c„(n)ao  + /(n) 

for  n  >  iV.  If  f{n)  =  0  ior  n  >  N,  the  recursion  is  called  homogeneous. 

We  first  discuss  homogeneous  recursions  whose  coefficients  are  "almost  constant."  In  other  words, 
except  for  initial  conditions, 

o„  =  ci(n)a„_i  +  C2(n)a„_2  +  . . .  +  Cfe(n)a„_fe,  11.33 

where  the  functions  Ci{n)  are  nearly  constant.  If  Ci{n)  is  nearly  equal  to  C'i,  then  the  solution  to 

An    =    CiAn-l+C2An-2  +  ■■■  +  CkAn-k,  11-34 

with  initial  conditions,  should  be  reasonably  close  to  the  sequence  a„.  We  will  not  bother  to  discuss 
what  reasonably  close  means  here.  It  is  possible  to  say  something  about  it,  but  the  subject  is  not 
simple. 

What  can  we  say  about  the  solution  to  (11.34)?  Without  the  initial  conditions,  we  cannot  say 
very  much  with  certainty;  however,  the  following  is  usually  true. 

Principle  11.1    Constant  coefficient  recursions     Let  r  be  the  largest  root  of  the  equation 

=  Cir''-'^  +  Czr''-'^  +  . . .  +  Ckr^ .  11.35 

If  this  root  occurs  with  multipUcity  m,  then  there  is  usually  a  constant  A  such  that  the  solution 
to  (11.34)  that  satisfies  our  (unspecified)  initial  conditions  is  asymptotic  to  An"^~^r^. 

This  result  is  not  too  difficult  to  prove.  Given  the  initial  conditions,  one  can  imagine  using  (11.34) 
to  obtain  a  rational  function  for  AnX^ .  The  denominator  of  the  rational  function  will  be  p{x)  = 
1  —  {Cix  +  . . .  +  CkX^).  Now  imagine  expanding  the  result  in  partial  fractions.  The  reason  for  the 
lack  of  a  guarantee  in  the  principle  is  that  a  factor  may  cancel  from  the  numerator  and  denominator 
of       AnX",  giving  a  lower  degree  polynomial  than  p{x). 

The  principle  is  perhaps  not  so  interesting  because  it  gives  much  less  accurate  results  than  those 
obtained  by  using  generating  functions  and  partial  fractions.  Its  only  attractions  are  that  it  requires 
less  work  and  gives  us  an  idea  of  what  to  expect  for  (11.33). 
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Principle  11.2  Linear  recursions  Suppose  that  aln)  —>  d  as  n  ^  oo  and  that  at  least 
one  Ci  is  nonzero.  Let  r  be  the  largest  root  of  (11.35).  Then  r"  is  probably  a  fairly  reason- 
able crude  approximation  to  the  solution  a„  of  (11.33)  that  satisfies  our  (unspecified)  initial 
conditions.  Usually 

lim  (a„)i/"  =  r. 

Example  11.22  Involutions  Let  a„  be  the  number  of  permutations  of  n  which  are  involutions, 
that  is,  no  cycle  lengths  exceed  two.  Either  n  is  in  a  cycle  with  itself  OR  it  forms  a  cycle  with  one 
of  the  remaining  n  —  1  elements  of  n.  Thus 

o„  =  a„_i  +  (n  -  l)a„_2, 

with  some  appropriate  initial  conditions. 

The  coefficients  of  this  recursion  are  not  almost  constant,  but  we  can  use  a  trick,  which  works 
whenever  we  have  coefficients  which  are  polynomials  in  n.  Let  6„  =  an/(n!)'',  where  d  is  to  be 
determined.  Dividing  our  recursion  by  (n!)'',  and  doing  a  little  simple  algebra,  we  have 

1  ,  n-1  , 

On    =    —jOn-l  +  -1—^  T--jOn-2- 

n"  (n^  —  nj" 

li  d  <  1/2,  the  last  coefficient  is  unbounded,  while  \i  d  >  1/2,  both  coefficients  on  the  right  side 
approach  0.  On  the  other  hand,  with  d  =  1/2,  the  first  coefficient  approaches  0  and  the  second 
approaches  1.  Thus  we  are  led  to  consider  the  recursion  6„  =  6„_2  and  hence  the  roots  of  the 
polynomial  =  1.  Since  the  largest  is  r  =  1,  we  expect  that  6„  should  approach  1.  Thus  (n!)^/^  is  a 
rough  approximation  to  a„.  We  can  eliminate  the  factorial  by  using  Stirling's  formula  (Theorem  1.5 
(p.  12)).  Since  the  approximation  in  Principle  11.2  is  so  crude  we  may  as  well  ignore  factors  like 
\j2'Kn  and  simply  say  that  a„  probably  grows  roughly  like  (njeY'l'^ .  Q 


We  now  present  a  type  of  recursion  that  often  arises  in  divide  and  conquer  problems  such  as 
Mergesort.  Some  authors  have  called  various  forms  of  the  theorem  associated  with  this  principle  a 
master  theorem  for  recursions.  The  reason  we  have  put  "function"  in  quotes  is  explained  after  the 
principle. 

Principle  11.3  Master  Principle  for  Recursions  V^c  want  to  study  the ''function"  T(n). 
Suppose  that  for  some  "functions"  fin),  s\(n),. . . ,  Sw(n),  some  N,  and  some  0  <  c  <  1  we  have 

(i)  T(n)  >Oforn>N, 

(ii)  /(n)  >Oforn>N, 

(Hi)  Si(n)  —  cn  €  0(1)  for  1  <i  <  w, 

(iv)  T(n)  =  f(n)  +  T(si(n))  +  T(s2(n))  +  ■■■  +  r(s^(n)). 
Let  b  =  \ogw/log(l/c).  Then 

(a)  if  f(n)/n^  — »■  oo  as  n  — »■  oo,  then  we  usually  have  T(n)  G  @(n^); 

(b)  if  f(n)/n^  — »■  0  as  n  — »■  oo,  then  we  usually  have  T(n)  G  6(/(n)); 

(c)  if  nf'  e  Q(f(n)),  then  we  usually  have  T(n)  e  6(n^logn). 
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The  principle  says  that  T{n)  grows  hke  the  faster  growing  of  /(n)  and  unless  they  grow  at  the 
same  rate,  in  which  case  T{n)  grows  faster  by  a  factor  of  logn. 

Why  is  "function"  in  quotes?  Consider  merge  sorting.  We  would  like  T(n)  to  be  the  number  of 
comparisons  needed  to  merge  sort  an  n-long  list.  This  is  not  a  well-defined  function!  The  number 
depends  on  the  order  of  the  items  in  the  original  list.  (See  Example  7.13  (p.  211).)  Thus,  we  want 
to  let  T{n)  stand  for  the  set  of  all  possible  values  for  the  number  of  comparisons  that  are  required. 
Hence  T{n)  is  really  a  collection  of  functions.  Similarly  f{n)  and,  more  rarely,  the  Si(n)  may  be 
collections  of  functions.  Thus,  a  statement  like  T(n)  G  6(/(n))  should  be  interpreted  as  meaning 
that  the  statement  is  true  for  all  possible  values  of  T{n)  and  f{n). 

Example  11.23  Recursive  multiplication  of  polynomials  Suppose  we  want  to  multiply 
two  polynomials  of  degree  at  most  n,  say 

P{x)  =  po+Pix-\  and    Q{x)  =  qo  +  qix -\  hQnx'^- 

A  common  method  for  doing  this  is  to  use  the  distributive  law  to  generate  (n  +  1)^  products  po^O) 

Poqix,  poq2x'^ ,  ■  ■ .,  Pnqnx'^"  and  then  collect  the  terms  that  have  the  same  powers  of  x.  This  involves 
(n  +  1)^  multiplications  of  coefficients  and,  it  can  be  shown,  ti^  additions  of  coefficients.  Thus,  the 
amount  of  work  is  ©(n^). 

Is  this  the  best  wc  can  do?  Of  course,  we  can  do  better  if  P{x)  or  Q{x)  have  some  coefficients 
that  are  zero.  Since  we're  concerned  with  finding  a  general  algorithm,  we'll  ignore  this  possibility. 
There  is  a  recursive  algorithm  which  is  faster.  It  uses  the  following  identity,  which  you  should  check 

Identity:    If  Pl{x),  Ph{x),  Ql{x)  and  Qh{x)  are  polynomials,  then 

{Pl{:x)  +  Ph{x)x''')  {Ql{x)  +  Qh{x)x''') 

=  A{x)  +  {C{x)  -  A{x)  -  B{x))x"'  +  B{x)x'^"' 

where 

A{x)  =  Pl{x)Ql{x),    B{x)  =  Ph{x)Qh{x), 

and 

C{x)  =  {Pl{x)  +  Ph{x)){Ql{x)+Qh{x)). 

We  can  think  of  this  identity  as  telling  us  how  to  multiply  two  polynomials  P{x)  and  Q{x)  by 
splitting  them  into  lower  degree  terms  (the  polynomials  Pl{x)  and  Ql{x))  and  higher  degree  terms 
(the  polynomials  Ph{x)x"^  and  Qh{x)): 

P{x)  =  PLix)  +  Ph{x)x"'    and    Q{x)  =  Ql(x)  +  (x)x™. 

The  identity  requires  three  polynomial  multiplications  to  compute  A{x),  B{x)  and  C{x).  Since 
the  degrees  involved  are  only  about  half  the  degrees  of  the  original  polynomials,  we've  gone  from 
about  multiplications  to  about  3(n/2)^  =  3n^/4,  an  improvement  by  a  constant  factor  as  n  — >  oo. 
When  this  happens,  applying  an  algorithm  recursively  usually  gives  an  improvement  in  the  exponent. 
In  this  case,  we  expect  Q{n'^)  for  some  d  <  2  instead  of  n^. 

Here's  a  recursive  algorithm  for  multiplying  two  polynomials  P{x)  =  Po  +  PiX  ■  ■  ■  +  PnX^  and 
Q{x)  =  qo-V  qix  +  ■  ■  ■  +  qnX^  of  degree  at  most  n. 

mUT(.P{x),Q{x),n) 

If  (n=0)    Return  po9o 
Else 

Let  m  —  n/2  rounded  up. 

Pl{x)    =   po+PiX-\  Pm-lX""'^ 

Ph{x)     =    Pm+Pm+lX-\  PnX"-"™ 

Ql{x)    =   qo+PiX-\  qm-ix'^~^ 

Qh{x)   =  qm  +  qm+ix-\  g„a;"-™ 
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A{x)    =   mLT{PL{x),  Ql{x),  m-1) 
B{x)    =   MULT(Pf/(x),  Qh{x),  n-m) 
C{x)    =   MULT {Pl{x)  +  Ph{x),  Ql{x)  +  Qh{x),  n  -  m) 
D{x)   ^  A{x)  +   {C{x)  -  A{x)  -  B{x))x'^  +  B{x)x'^"' 
Return  D{x) 
End  if 

End 

Wc  store  a  polynomial  as  array  of  cocfBcicnts.  The  amount  of  work  done  in  calculation  is  the 
number  of  times  we  multiply  or  add  two  coefficients. 

Let's  count  multiplications.  Since  a  polynomial  of  degree  n  has  n  +  1  coefficients  (constant 
term  to  coefficient  of  x"  ),  we"ll  denote  by  T{n  +  1)  the  number  of  multiplications  for  a  polynomial 
of  degree  n.  To  multiply  two  constants,  we  have  r(l)  =  1.  You  should  be  able  to  show  that  the 
recursive  part  of  the  algorithm  gives  us  T(n)  =  T{m)  +  T{n  —  m)+  T{n  —  m) ,  where  m  =  [n/2j ,  the 
largest  integer  not  exceeding  n/2.  The  Master  Principle  applies  with  w  =  3,  f{n)  =  0,  si  (n)  =  [n/2\, 
S2{n)  =  S3{n)  =  n-[n/2\  =  [n/2]i  and c  =  1/2.  Thus 6  =  log3/log2  and r(n)  €  e(ni°«3/iog2-)_ 

Since  log  3/ log  2  is  about  1.6  which  is  less  than  2,  this  is  less  work  than  the  multiplications  in 
the  common  method  when  n  is  large. 

What  about  additions  and  subtractions?  The  polynomials  A{x),  B{x)  and  C{x)  all  have  degree 
about  n.  Thus,  computing  C{x)  —  A{x)  —  B{x)  requires  about  2n  subtractions.  The  polynomials 
A{x)  and  (C(x)  —  A{x)  —  B{x))x"^  overlap  in  about  n/2  terms  and  so  adding  them  together  requires 
about  n/2  additions.  Adding  in  B{x)x^"^  requires  about  n/2  more  additions.  Thus,  there  are  about 
3n  additions  and  subtractions  involved  in  computing  D{x)  from  A,  B  and  C.  If  U{n)  is  the  number 
of  additions  and  subtractions  in  the  recursive  algorithm  for  polynomials  of  degree  n  —  1, 

U{n)  =  f{n) +  U{m)  =  U{n-m) +  U{n-m)       where       /(n)  e  e(3n). 

By  the  Master  Principle  with  b  =  log  3/ log  2,  C/(n)  £  Q{n^).  Thus  the  algorithm  requires 
Q^^iog3/iog2^  multiplications,  additions  and  subtractions.  Q 


Sums  of  Positive  Terms 


Suppose  that 

fe=0 

where  tn_k  >  0  and  L„  oo.  Imagine  n  fixed  and  think  of  tn,k  as  a  sequence  in  k;  i.e.,  fn.o.in.i, 
. . ..  Let  rk{n)  =  tn,k+i/tn,k,  the  ratio  of  consecutive  terms.  Usually  we  simply  write  r/c  for  rk{n).  In 
practice  we  usually  have  one  of  four  situations 

(a)  Decreasing  terms  {vk  <  1  for  all  k).    We  will  study  this. 

(b)  Increasing  terms  (r/j  >  1  for  all  k)    Convert  to  (a)  by  writing  the  sequence  backwards: 

~    ^  ^  in,k    ~    ^  ^  ^n,Ln—i    ~    ^  ]  ^n,kj 
k=0  i=0  k=0 

(c)  Increasing,  then  decreasing  (r^  <  1  for  fc  <  Kn  and  rj,  >  1  for  fc  >  Kn):    split  the  sum  at 
Kn-  This  gives  one  sum  like  (a)  and  one  like  (b): 

k=0  k=0  k=0 


^  The  ceiling  function  [a;]  is  the  least  integer  not  exceeding  x. 
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where  M„  =  Ln  -  Kn  and  Un,k  =  tn,k+Kn  ■ 

(d)  Decreasing,  then  increasing  (rfe  >  1  for  fc  <  Kn  and  rfe  <  1  for     >  Kn-    Split  into  two  as 

done  for  (c). 

Suppose  we  arc  dealing  with  (a),  decreasing  terms,  and  that  hm„^oo  fk{n)  =  r  exists  for  each 
k  and  does  not  depend  on  k.  This  may  sound  unusual,  but  it  is  quite  common.  If  r  =  1,  we  will 
call  the  terms  slowly  decreasing.  If  |r|  <  1,  we  will  call  the  terms  rapidly  decreasing.  The  two  sums 
obtained  from  Case  (c)  are  almost  always  slowly  decreasing  and  asymptotically  the  same. 


Principle  11.4  Sums  of  rapidly  decreasing  terms  If  there  is  an  r  with  0  <  r  <  1 
such  that  lim„^oo  tn,k+i/tn,k  =  r  for  each  value  of  k,  then  we  usually  have  a  geometric  series 
approximation : 

fe=0  fe>0  ^ 

(Note  that  r  does  not  depend  on  k.) 


Aside:  Actually,  there  is  a  more  general  principle:  If  rfe(n)  ^  as  n  ^  oo  and  there  is  an  ii  <  1 
such  that  \pk\  <  R  for  all  k,  then  the  sum  is  asymptotic  to  tn,o  Si^o  Pi'"  Pi-  Principle  11.4  is  the 
special  case  Pi  =  r  for  all  i. 


Example  11.24  Small  subsets  How  many  subsets  of  an  n-set  have  at  most  cn  elements  where 
c  <  1/2?  You  should  have  no  trouble  seeing  that  the  answer  is 

where  [cn\  is  the  largest  integer  that  does  not  exceed  cn.  Let's  approximate  the  sum.  Since  the 
terms  are  increasing,  we  reverse  the  order: 

[cnJ  , 
k=0 

We  have 

tn,k+l    _    ([cnj-fc-l)    _  [cn\  -  k 

tn,k     ~     (l4_J     ~  n-[cn\+k  +  l 
When  k  is  small  compared  to  n,  this  ratio  is  close  to  c/(l  —  c)  and  so,  by  Principle  11.4,  we  expect 

[A  (lc"j)         ^    l-c  /  n  \ 

Z^Uj  ^-(r/(^-r^\  ^-'?.r\\rn\  ' 


[cnJ  —  kj 


fe=0 


A  table  comparing  exact  and  approximate  values  is  given  in  Figure  11.2.  Q 


Wc  now  look  at  sums  with  slowly  decreasing  terms.  Such  sums  can  usiially  be  done  by  inter- 
preting the  sum  as  an  approximation  to  a  Riemann  sum  associated  with  an  integral.  We'll  merely 
state  the  result  for  the  most  common  case. 
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c 

n 

10 

20 

50 

100 

200 

A 

11.25 

213.75 

2.384  X  10^ 

1.947  X  10^^ 

1.815  X  10^'^ 

0.1 

E 

11 

211 

2.370  X  10^ 

1.942  X  10^^ 

1.813  X  lO^"^ 

A/E 

1.023 

1.01 

1.006 

1.003 

1.002 

A 

60 

6460 

1.370  X  10^° 

7.146  X  10^° 

2.734  X  lO"^ 

0.2 

E 

56 

6196 

1.343  X  10^^ 

7.073  X  10^° 

2.719  X  10^2 

A/E 

1.07 

1.04 

1.02 

1.01 

1.005 

A 

630 

377910 

1.414  X  lO" 

4.124  X  10^^ 

4.942  X  10^"^ 

0.4 

E 

386 

263950 

1.141  X  lO" 

3.606  X  10^^ 

4.568  X  10^"^ 

A/E 

1.6 

1.4 

1.2 

1.14 

1.08 

Figure  11.2  The  exact  values  [E)  and  approximate  values  [A)  of  the  sum  in  (11.37)  for  c  =  0.1,  0.2,  0.4 
and  n  =  10,  20,  40,  100,  200.  The  ratios  A/E  are  also  given. 


Principle  11.5  Sums  of  slowly  decreasing  terms  Let  rk{n)  =  tn^k+i/tn,k  and  suppose 
lim„^oo  T^kin)  =  1  for  all  k.  Suppose  there  is  a  function  f{n)  >  0  with  lim^^oo  /{n)  =  0  such 
that,  for  all  "fairly  large"  k,  (1  —  rk{n))/k  ~  f{n)  as  n  ^  oo.  Then  we  usually  have 

fc>o  »    ■'  ^  ^ 

If  the  terms  in  the  sum  first  increase  and  then  decrease  and  the  preceding  applies  to  one  half  of 
the  sum,  then  the  answer  in  (11.38)  should  be  doubled. 

"Fairly  large"  means  that  k  is  small  compared  to  n  but  that  k  is  large  compared  with  constants.  For 
example,  {k  —  l)/kn  can  be  replaced  by  1/n. 

Example  11.25  Subsets  of  equal  size  Suppose  j  is  constant.  How  many  ways  can  we  choose 
j  distinct  subsets  of  n  all  of  the  same  size?  Since  there  are  (^)  subsets  of  size  k,  you  should  have  no 
trouble  seeing  that  the  answer  is 

Since  the  binomial  coefficients  (^)  increase  to  a  maximum  at  A;  =  [n/2\  and  then  decrease,  the  same 
is  true  for  the  terms  in  the  sum.  Thus  we  are  in  Case  (c).  We  can  take  Kn  =  [n/2\  and  we  expect 
(11.38)  to  apply. 

For  simplicity,  let's  treat  n  as  if  it  is  even.  The  second  half  of  the  sum  is 


and  so 


fn-k-n/2V  _  /I  -2fc/n\ 
V fc  +  1  +  n/2 j    ~  \l  +  2k/nj 

»  6-^^'=/"  «  l-Ajk/n, 
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where  the  last  approximations  come  from  using  1  +  .t  «  as  a;  ^  0,  first  with  x  =  ±2k/n  and  then 
with  X  =  Ajk/n.  Thus  (1  —  rk)/k  «  4j/n.  So  we  take  f(n)  =  Aj/n  in  Principle  11.5.  Remembering 
that  we  need  to  double  (11.38),  we  expect  that 


This  is  correct.  Q 


Exa  m pie  1 1 . 26  I  nvol Utions  In  Theorem  2.2  (p.  48) ,  we  showed  that  the  number  of  involutions 
of  n  is 

L"/2J 

=  E  ^ 

^ — '   in.  — 


k=0 


{n-2k)\2>'kV 


It  is  not  obvious  where  the  maximum  term  is;  however,  we  can  find  it  by  solving  the  equation 

tn,k+i/tn,k  =  1  for  k.  We  have 


tn,k+i        {n  -  2k){n  -  2k  -  1) 


tn.k 


2{k  +  l) 


1. 


11.39 


Clearing  fractions  in  the  right  hand  equation  leads  to  a  quadratic  equation  for  k  whose  solution  is 
close  to  TO  =  (n  —  ^/n)/2.  For  simplicity,  wo  assume  that  m  is  an  integer.  Since  the  maximum  is  not 
at  the  end,  we  expect  to  apply  Principle  11.5.  We  split  the  sum  into  two  pieces  at  to,  one  of  which  is 


fe=0 


(n  -  2to  -  2fc)!2'"+'=(TO  +  k)\ ' 


Adjusting  (11.39)  for  this  new  index,  we  have 

tn,k+i         iVn  -  2kf 


rk{n) 


tn,k 


n  —  -v/n  +  2k ' 


After  some  approximations  and  other  calculations,  we  find  that  r/j  w  1  —  {4k  —  l)/^/n.  Thus  f{n) 
4:/y/n  and  so,  doubling  (11.38),  we  expect 


2.A78ni/4n! 


(^A^)!2("-^A^)/2  ((n-  v/^)/2)!' 
If  you  wish,  you  can  approximate  this  by  using  Stirling's  formula.  After  some  calculation,  we  obtain 


In 


%/2 


^2  e"/2-V^+i/4' 
This  is  correct.  Here  is  a  comparison  of  values. 


11.40 


n 

5 

20 

50 

100 

200 

(11.40) 

23.6 

2.241  X  IQi" 

2.684  X  10^4 

2.340  X  10**2 

3.600  X  10^^"^ 

In 

26 

2.376  X  10^° 

2.789  X  lO^'' 

2.405  X  10*2 

3.672  X  10^^^ 

ratio 

0.91 

0.94 

0.96 

0.97 

0.98 

The  convergence  is  not  very  fast.  It  can  be  improved  by  making  a  more  refined  asymptotic  estimate, 
which  we  will  not  do.  Q 
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Example  11.27  Set  partitions  We  proved  in  Theorem  11.3  (p.  320)  that  the  number  of  parti- 
tions of  an  n-set  equals 

fc! 


-  OO 


11.41 


fe=o 


We'll  use  Principle  11.5  (p.  346).  Taking  ratios  to  find  the  maximum  term: 

tn,k+i   _   (1  +  1/fc)"  _  e"/'^ 


t 


k  +  l 


k 


The  approximation  comes  from  1  +  1/k  =  exp(ln(l  +  1/fc))  «  e^^''.  We  want  e"'/^ /k  =  1.  Taking 
logarithms,  multiplying  by  k,  and  rearranging  gives  A:lnfc  =  n.  Let  s  be  the  solution  to  sins  =  n. 
We  split  the  sum  at  [sj .  It  is  convenient  (and  does  not  affect  the  answer)  to  treat  s  as  if  it  is  an 
integer. 

Unfortunately,  the  next  calculations  are  quite  involved.  You  may  want  to  skip  to  the  end  of  the 
paragraph.  We  now  have  a  sum  that  starts  at  s  and  so,  adjusting  for  shift  of  index. 


r-fe(n) 


(l  +  l/(s  +  fc))" 


on/(s+fe) 


oSlns/(s+fe) 


s  +  k  +  1  s  +  k  s  +  k 

Since        w  1  —  x  for  small  x  and  since  s  is  large,  we  have        =  ~  1  —  k/s.  Using  this  in 

our  estimate  for  rk{n): 

rk{n) 


^{l  —  k/s)  Ins 

S  +  k 


se 


-{k  In  s)  j s 


1-^ 

s 


fclns 


where  we  have  used  ti— 


1  —  X  again  and  also  e  ^  f»  1  —  x  for  small  x.  When  x  and  y  are  small, 
(1  — x)(l  — y)  ~  1  —  (x  +  j/).  In  this  case,  x  =  k/s\s  much  smaller  than  y  =  k{lns)/s  so  that  x  +  y  k,  y 
and  so  we  finally  have 


r-fe(n) 


1  - 


/elns 


and  so  /(n)  =  (lns)/s. 

Remembering  to  double  and  to  include  the  factor  of  1/e  from  (11.41),  we  have  (using  Stirling's 
formula  on  s!) 


Br, 


■\/27rs/ In  ss"" 


e  si 


vlns 

Here's  how  (11.42)  compares  with  the  exact  values. 


where  s  In  s  =  n  defines  s. 


11.42 


n 

5 

20 

50 

100 

200 

s 

3.76868 

9.0703 

17.4771 

29.5366 

50.8939 

(11.42) 

70.88 

6.305  X  10^3 

2.170  X  10^^ 

5.433  X  10^1^ 

7.010  X  10^^^ 

52 

5.172  X  10^3 

1.857  X  10^^ 

4.759  X  10"5 

6.247  X  10^75 

ratio 

1.36 

1.22 

1.17 

1.14 

1.22 

better 

1.03 

1.01 

1.005 

1.003 

1.002 

The  approximation  is  quite  poor.  Had  we  used  the  factor  of  VI  +  Ins  in  the  denominator  of  (11.42), 
the  relative  error  would  have  been  much  better,  as  shown  in  the  last  line  of  the  table.  How  did  we 
obtain  the  formula  with  \/l  +  In s  ?  After  obtaining  the  form  with  Vlns  and  noting  how  poor  the 
estimate  was,  we  decided  to  look  for  a  correction  by  trial  and  error.  Often,  a  good  way  to  do  this  is 
by  adjusting  in  a  simple  manner  the  part  of  the  estimate  that  contains  the  smallest  function  of  n — in 
this  case,  the  function  In  s.  We  tried  C  +  In  s  and  found  that  C  =  1  gave  quite  accurate  estimates.  D 
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Generating  Functions 


In  order  to  study  asymptotic  estimates  from  generating  functions,  it  is  necessary  to  know  something 
about  singularities  of  functions.  Essentially,  a  singularity  is  a  point  where  the  function  misbehaves 
in  some  fashion.  The  singularities  that  are  encountered  in  combinatorial  problems  are  nearly  all  due 
to  either 

•  attempting  to  take  the  logarithm  of  zero, 

•  attempting  to  raise  zero  to  a  power  which  is  not  a  positive  integer, 

or  both.  For  example,  —  ln(l  —  x)  has  a  singularity  at  x  =  1.  The  power  of  zero  requires  a  bit  of 
explanation.  It  includes  the  obviously  bad  situation  of  attempting  to  divide  by  zero;  however,  it 
also  includes  things  like  attempting  to  take  the  square  root  of  zero.  For  example,  \/l  —  4x  has  a 
singularity  at  a;  =  1/4.  To  explain  why  a  nonintegral  power  of  zero  is  bad  would  take  us  too  far 
afield.  Suffice  it  to  say  that  the  fact  that  A  has  two  square  roots  everywhere  except  at  A  =  0  is 
closely  related  to  this  problem. 

The  following  is  stated  as  a  principle  because  we  need  to  be  more  careful  about  the  conditions  in 
order  to  have  a  theorem.  For  combinatorial  problems,  you  can  expect  the  more  technical  conditions 
to  be  satisfied. 

Principle  11.6  Nice  singularities  Let  an  be  a  sequence  whose  terms  are  positive  for  all 
sufRciently  large  n.  Suppose  that  A{x)  =  J2n  '^nx"  converges  for  some  value  of  x  >  Q.  Suppose 
that  A{x)  =  f{x)g{x)  +  h{x)  where 

b, 


f{x)  =  (—  ln(l  —  x/r))  (1  —  x/rY,  c  is  not  a  positive  integer  and  we  do  not  have  b  =  c  =  0; 

•  A{x)  does  not  have  a  singularity  for  —r<x<  r; 

•  limx^r  g{x)  exists  and  is  nonzero  (call  it  L); 

•  h{x)  does  not  have  a  singularity  at  x  =  r. 


Then  it  is  usually  true  that 


L{\nn)\l/ry' 
n«+ir(-c)  ' 

6L(lnn)''-i(l/r)" 


ifc^O; 

11.43 

ifc  =  0; 


n 

where  F  is  the  Gamma  function  which  we  describe  below. 

The  value  of  r  can  be  found  by  looking  for  the  smallest  (positive)  singularity  of  A{x).  Often  g{r)  is 
defined  and  then  g{r)  =  L.  Since,  in  many  cases  there  is  no  logarithm  term  (and  so  6  =  0),  you  may 
find  it  helpful  to  rewrite  the  principle  for  the  special  case  6  =  0. 

The  values  of  the  Gamma  function  T{x)  can  be  looked  up  in  tables.  In  practice,  the  only 
information  you  are  likely  to  need  about  this  function  is 

r{k)  =  {k-l)\      when    >  0  is  an  integer,    r(a;  +  1)  =  a;r(a;)    and    r(l/2)  =  1/^. 

For  example,  we  compute  r(- 1/2)  by  using  r(l/2)  =  (-l/2)r(-l/2): 

r(-i/2)  =  -2r(i/2)  =  -2^. 


We  begin  with  a  couple  of  simple  examples. 
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Exa mple  11.28  Derangements  Let  Z)„  be  the  number  of  permutations  of  n  that  have  no  fixed 
points.  We  showed  in  (11.17)  that 


E 

n=0 


l-X 


We  apply  Principle  11.6  with  this  as  A{x).  There  is  a  singularity  at  a;  =  1  because  we  are  then 
dividing  by  zero.  Thus  r  =  1  and  we  have 

A(x)  =  (l-,x)^ie-^  +  0,        so    /(x)  =  (1 -x)-i, 


and  h{x) 


0. 


Thus  6  =  0  and  c  =  —1.  Since  ^(1)  =  1/e,  we  have  Dn/n\  ^  1/e.  In  Example  4.5  (p.  99)  we  used  an 
explicit  formula  for  £>„  to  show  that  Z)„  is  the  closest  integer  to  n!/e.  We  get  no  such  error  estimate 
with  this  approach.  Q 

Exannple  11.29   Rational  generating  functions    Suppose  that  vl(x)  =  X)„ana;"  =p(a;)/g(x) 

where  a„  >  0  and  p{x)  and  q{x)  are  polynomials  such  that 

•  r  is  the  smallest  zero  of  q{x)]  that  is,  q{r)  =  0  and  q{s)  ^  0  if  0  <  s  <  r; 

•  the  multiplicity  of  r  as  a  zero  is  k;  that  is,  q{x)  =  s{x){l  —  xjr)^  where  s(a;)  is  a  polynomial  and 
s(r)  +  0; 

•  V{r)  0; 

We  can  apply  Principle  11.6  with  f{x)  =  (1  —  xlr)~^  and  ^(a;)  =  p{x)/s{x).  Since  r  is  a  zero  of 
q{x)  of  multiplicity  fc,  it  follows  that  this  gives  an  asymptotic  formula  for  a„: 

p(r)n'=-i(l/r)" 


s{r)  {k  -  1)! 


11.44 


We  leave  it  to  you  to  apply  this  formula  to  various  rational  generating  functions  that  have  appeared 
in  the  text.  D 

Example  11.30  The  average  time  for  Quicksort  Let  g„  be  the  average  number  of  compar- 
isons that  are  needed  to  Quicksort  n  long  lists.  In  Example  10.12  (p.  289)  we  obtained  the  ordinary 
generating  function 

_  -21n(l-x)-2x 

We  can  take  ^(a;)  =  Q{x),  r  =  1,  f{x)  =  -  ln(l  -  a;)  (1  -  x)'^  and  ^(a;)  =  2  +  (2a;/  ln(l  -  x)) .  Then 
lim^,^!  5(a;)  =  2.  Prom  (11.43), 


'  n  +  1 
2 (  )  Inn  ~  2nlnn, 

n 


11.45 


which  we  also  obtained  in  (10.30)  by  manipulating  an  explicit  expression  for  q„.  Here  are  some 
numerical  results. 


n 

5       20       50       100  1000 

(11.45) 
qn 

ratio 

16.09  119.8  391.2  921.0  13816. 
7.4  71.11  258.9  647.9  10986. 
2.2      1.7       1.5       1.4  1.3 

The  approximation  converges  very  slowly.  You  might  like  to  experiment  with  adjusting  (11.45)  to 
improve  the  estimate. 

An  alternative  approach  is  to  write  Q{x)  as  a  difference  of  two  functions  and  deal  with  each 
separately:  Q{x)  =  Qi{x)  —  Q2{x)  where 

-21n(l  -x)  ,       _  _  2a; 


Qi{x) 


{1-xy 


and 


Q2{X) 


(l-a;)2- 
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Now  fi{x)  =  -ln(l-x)(l  /2(x)  =  (1  gi{x)  =  2,  g2ix)  =  2x  and  hi{x)  =  hi(x)  =  0. 

We  obtain  gi^„  ~  2n\nn  as  before  and  52, n     2n.  Subtracting  we  again  obtain  g„  ~  2nlnn.  D 

So  far  we  have  dealt  with  generating  functions  that  required  relatively  little  algebra  to  obtain 
the  asyniptotics.  We  now  look  at  a  more  complicated  situation. 

Example  11.31  Binary  operations  In  Example  11.3  (p.  312)  we  studied  Zn,  the  number  of 
ways  a  string  of  n  zeroes  can  be  parenthesized  to  give  zero,  and  obtained  the  ordinary  generating 
function 


T{x)  -  1  +  ,J{l-T{x)f+Ax 


where 


nx)  =  i-^^ 


2 

is  the  ordinary  generating  function  for  all  ways  to  parenthesize  the  string.  (See  (11.12)  and  (11.14).) 

To  gear  up  for  studying  Z{x),  we  begin  with  the  simple  T{x).  Of  course,  T{x)  could  easily  be 
studied  by  using  the  explicit  formula  for  its  coefficients,  i„  =  ^^n-i)''  however,  the  point  is  to 
understand  how  to  handle  the  square  root  singularity.  The  square  root  has  a  singularity  at  r  =  1/4 
since  it  vanishes  there.  Thus  we  write 

T(x)  =  f{x)g{x)+h{x),       where    /(a;)  =  (1  -  a;/r)i/2^    g{x)  = -1/2    and  h(x)=l/2. 

Prom  (11.43)  we  obtain 

tn  =  a„~(-l/2)^~^^/^Vl/4)-" 


-l/2)(n-3/Vr(-l/2))^ 


11.46 

4n—l 


since  (-l/2)r(-l/2)  =  r(l/2)  = 

We're  ready  for  Z(x).  Since  r  =  1/4  is  a  singularity  of  T{x),  it  is  also  a  singularity  of  Z{x).  We 
can  have  other  singularities  when  the  complicates  square  root  is  zero;  that  is,  (1  —  T{x)'^)  +  4a;  =  0. 
We  have 


2                l  +  2vT^I+(l-4x)  ,  ^          1  +  ti.r  +  ^1  -  4.i; 
(l-r(x))  +4x  =  +4x  =  . 

For  this  to  be  zero  we  must  have 

l  +  6a;  =  -VI  -  4a;.  11.47 

Squaring:  l  +  12x  +  36x^  =  1  —  Ax  and  so  16a;  +  36a;^  =  0.  The  two  solutions  are  x  =  0  and  x  =  —4/9. 
Since  squaring  can  introduce  false  solutions,  so  we'll  have  to  check  them  in  (11.47).  The  value  .x  =  0 
is  not  a  solution  to  (11.47)  and  the  value  x  =  —4/9  is  irrelevant  since  it  does  not  lie  in  the  interval 
[-1/4,1/4].  Thus  r  =  1/4. 

How  can  we  find  /,  g  and  hi  It  seems  likely  that  f{x)  ~  \/l  —  Ax  since  we  have  T{x)  present  in 
the  numerator  of  Z{x).  Then  Z(r)  =  f(r)g{r)  +  h{r)  =  h{r)  since  /(1/4)  =  0.  We  could  simply  try 
to  let  h{x)  be  the  constant  Z{r),  which  is  (—1  +  \/5  )/4.  Thus  we  have 

fix)  =  VT^x,        h{x)  =  Z(l/4)  =  gix)  =  m_^l^. 

4  VI  —  4x 

We  need  to  find  L.  To  simplify  expressions,  let  s  =  \/\  —  4x.  We  have 

L=    hm   f  -^+V(l  +  ^)-  +  16x-vg\  ,g 

\  4s  / 
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n 

5 

10 

20 

30 

40 

50 

(11.46) 

12.92 

4676. 

1.734  X  10^ 

9.897  X  lO" 

6.740  X  10^° 

5.057  X  10^^ 

in 

14 

4862 

1.767  X  10^ 

1.002  X  10^^ 

6.804  X  10^° 

5.096  X  lO^*^ 

t  ratio 

0.92 

0.96 

0.98 

0.987 

0.991 

0.992 

(11.49) 

3.571 

1293. 

4.792  X  10* 

1  A 

2.735  X  lO" 

1.863  X  10^° 

1.398  X  lO^*" 

5 

1381 

4.980  X  10* 

2.806  X  10^^ 

1.899  X  10^° 

1.419  X  10^^ 

z  ratio 

0.7 

0.94 

0.96 

0.97 

0.98 

0.985 

0.36 

0.284 

0.282 

0.280 

0.279 

0.279 

Figure  11.3  Asymptotic  and  exact  values  for  tn  and  Zn  in  Example  11.31.  The  ratios  of  asymptotic  to 
exact  values  are  given  and  the  ratio  Zn/tn,  which  we  have  shown  should  approach  (5  —  %/5)/10  =  0.276393. 


How  can  we  evaluate  the  limit?  Since  numerator  and  denominator  approach  zero,  we  can  apply 
I'Hopital's  Rule  (a  proof  can  be  found  in  any  rigorous  calculus  text): 


Theorem  11.10  THopital's  Rule  Suppose  that  lim^^afix)  =  0,  lim^^^a ^(a;)  =  0,  and 
g{x)  ^  0  when  0  <  \x  —  a\  <  6  for  some  S  >  0.  Then 

y        fix)  y  fix) 

hm  — -  =  hm  -— -, 

x^a  g[x)  x^a  g'{X) 

provided  the  latter  limit  exists. 

We  leave  it  to  you  to  apply  FHopital's  Rule  and  some  algebra  to  evaluate  the  limit.  We'll  use  a 
trick — rewrite  everything  in  terms  of  s  and  then  use  I'Hopital's  Rule: 


L  =  1.™  -.w(i  +  ^f-4(»--iW5 

s^o\  4s  J  ^  ^ 

(-S  +  V5  +  2S-  3s2  -  V5\  ,      ^  , 

=  I'l^o  [  4i  )  ^^^^^''^ 

/-l  +  (l/2)(5  +  2s-3s2)-i/2(2-6s)\         ,  •    „  , 

=  lini  (  ^  '         —  '-  '-  j        by  I'Hopital's  Rule 

-1  +  5-V2 


We  finally  have 


20 


^"  20     [     n     r'  '  10  ^n^/^- 


Comparing  the  result  with  the  asymptotic  formula  for  t„,  the  total  number  of  ways  to  parenthesize 
the  sequence  0  A  ...  A  0,  we  see  that  the  fraction  of  parenthesizations  that  lead  to  a  value  of  zero 
approaches  (5  —  v^)/10  as  n  ^  oo.  Various  comparisons  are  shown  in  Figure  11.3.  Q 
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Example  11.32  Three  misfits  Although  Principle  11.6  applies  to  many  situations  of  interest 
in  combinatorics,  there  are  also  many  cases  where  it  fails  to  apply.  Here  arc  three  examples. 

•  No  singularities:    The  EGF  for  the  number  of  involutions  of  n  is  exp(a;  +  This  function 
has  no  singularities,  so  our  principle  fails. 

•  Zero  radius  of  convergence:    The  number  of  simple  graphs  with  vertex  set  nisgn  =  where 

N  =  (").  By  the  Exponential  Formula  (Theorem  11.5  (p.  321)),  the  EGF  for  the  number  of 
connected  graphs  is  ln(G(a;)),  where  G{x)  is  the  EGF  for  gn-  Unfortunately,  G{x)  only  converges 
at  X  =  0  so  our  principle  fails. 

•  Bad  singularities:    Let  p{n)  be  the  number  of  partitions  of  the  integer  n.  At  the  end  of  Exam- 
ple 10.15  (p.  295)  we  showed  that  the  ordinary  generating  function  for  p{n)  is 

oo 

p{x)  =  n(i-^')"'- 

i=l 

Clearly  r  =  1.  Unfortunately,  for  every  real  number  c,  P(.t)/(1  —  ^  oo  as  .t  ^  1.  Thus  our 
principle  does  not  apply;  however,  asymptotic  information  about  p{n)  can  be  deduced  from  the 
formula  for  P{x).  The  methods  are  beyond  this  text.  (Actually,  P{x)  behaves  even  worse  than 
we  indicated — it  has  a  singularity  for  all  x  on  the  unit  circle.)  Q 

So  far  we  have  dealt  with  generating  functions  that  are  given  by  an  explicit  formula.  This  does 
not  always  happen.  For  example,  the  ordinary  generating  function  for  rooted  unlabeled  full  binary 
trees  by  number  of  leaves  is  given  implicitly  by 

T(xf+T(x^) 

Although  any  specific  t„  can  be  found,  there  is  no  known  way  to  solve  for  the  function  T{x).  The 
following  principle  helps  with  many  such  situations.  Unfortunately,  the  validity  of  its  conclusion  is 
a  bit  shakier  than  (11.43). 

Principle  11.7  Implicit  functions  Let  an  he  a  sequence  whose  terms  are  positive  for  all 
sufficiently  large  n.  Let  A{x)  be  the  ordinary  generating  function  for  the  an 's.  Suppose  that  the 
function  F{x,y)  is  such  that  F{x,A{x))  =  0.  If  there  are  positive  real  numbers  r  and  s  such 
that  F{r,  s)  =  0  and  Fy{r,  s)  =  0  and  if  r  is  the  smallest  such  r,  then  it  is  usually  true  that 


2TTFyy{r,s) 


n 


Example  11.33  RP-trees  with  degree  restrictions  Let  D  be  a  set  of  nonnegative  integers 
that  contains  0.  In  Exercise  10.4.11  (p.  301),  we  said  an  RP-tree  was  of  outdegree  D  if  the  number  of 
sons  of  each  vertex  lies  in  D.  Let  Tn{x)  be  the  ordinary  generating  function  for  unlabeled  RP-trees 
of  outdegree  D  by  number  of  vertices.  In  Exercise  10.4.11,  you  were  asked  to  show  that 

Tn{x)  =  xJ2Td{x)''. 

It  can  be  shown  that  toin)  >  0  for  all  sufficiently  large  n  if  and  only  if 

gcd{d  -l:d€D}  =  1. 

We  invite  you  to  prove  this  by  looking  at  the  equation  for  g{x)  =  Td{x)/x  and  using  the  fact  that  all 
sufficiently  large  multiples  of  the  gcd  of  a  set  S  can  be  expressed  as  a  nonnegative  linear  combination 
of  the  elements  of  S. 
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Let  F{x,  y)  =  y  —  x  '}2cieD  y"^-  Then  r  and  s  in  Principle  11.7  are  found  by  solving  the  equations 

l-rY^ds'^-^  =  0. 

With  a  bit  of  algebra,  we  can  get  an  equation  in  s  alone  which  can  be  solved  numerically  and  then 
used  in  a  simple  equation  to  find  r: 

1  =  ^(d-l)s'^,  11.51 


Finally 


deD 


d 


d-1  • 


11.53 


Since  the  right  side  of  (11.51)  is  a  sum  of  positive  terms,  it  is  a  simple  matter  to  solve  it  for  the 

unique  positive  solution  to  any  desired  accuracy.  This  result  can  then  be  used  in  (11.52)  to  find  r 
accurately.  Finally,  the  results  can  be  used  in  (11.53)  and  (11.50)  to  estimate  tD{n).  D 

Example  11.34  Unlabeled  rooted  trees  In  (11.29)  wc  showed  that  the  generating  function 
by  vertices  for  unlabeled  rooted  trees  in  which  each  vertex  has  outdegree  at  most  three  satisfies 

T(.)  =  1  +  ,nx)-  +  3T(.)T(.-)  +  2T(.3)_  ^^^^ 

Since  T{x^)  and  T{x^)  appear  here,  it  does  not  seem  possible  to  apply  Principle  11.7.  However, 
it  can  be  applied — we  simply  set 

X       1       ^   j/'+3T(a:^)j/  +  2T(a:3) 
F{x,y)  =  1-y  +  x  ^—^  

D 

The  reason  why  this  is  permitted  is  somewhat  technical:  The  singularity  r  of  T{x)  turns  out  to  lie 
in  (0, 1),  which  guarantees  that  x^  <  x'^  <  x  when  x  is  near  r.  As  a  result  T{x'^)  and  T{x^)  are  not 
near  a  singularity  when  x  is  near  r. 

Even  if  you  did  not  fully  follow  the  last  part  of  the  previous  paragraph,  you  may  have  noticed 
the  claim  that  the  singularity  of  T{x)  lies  in  (0, 1).  We  should  prove  this,  but  we  will  not  do  so. 
Instead  wc  will  be  satisfied  with  noting  that  our  calculations  produce  a  value  of  r  €  (0, 1) — a  circular 
argument  since  we  needed  that  fact  before  beginning. 

The  equations  for  r  and  s  are 

+  =  0. 

We  have  a  problem  here:  In  order  to  compute  r  and  s  we  need  to  be  able  to  evaluate  the  function 
T  accurately  and  we  lack  an  explicit  formula  for  T{x).  This  can  be  gotten  around  as  follows. 

Suppose  we  want  to  know  T{x)  at  some  value  of  x  =  p  £  (0, 1).  If  we  knew  the  values  of  T{p^) 
and  T{p^),  we  could  regard  (11.54)  as  a  cubic  in  the  value  of  T{p)  and  solve  it.  On  the  other  hand,  if 
we  knew  the  values  of  r((p^)^)  and  T{(p'^)^),  we  could  set  x  =  p^  in  (11.54)  and  solve  the  resulting 
cubic  for  T{p'^).  On  the  surface,  this  does  not  seem  to  be  getting  us  anywhere  because  we  keep 
needing  to  know  T{p'^)  for  higher  and  higher  powers  k.  Actually,  that  is  not  what  we  need — we  need 
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to  know  T{p^)  with  some  desired  degree  of  accuracy.  When  k  is  large,  p*^  is  close  to  zero  and  so  T{p^) 
is  close  to  T(0)  =  to  =  1-  (We  remind  you  that  when  we  derived  (11.54)  we  chose  to  set  to  =  1.) 
How  large  does  k  need  to  be  for  a  given  accuracy?  We  won't  go  into  that. 

There  is  another  trick  that  will  simplify  our  work  further.  Since  s  =  T{r),  we  can  eliminate  s 
from  (11.56)  and  so  r(T(r)^  +T(r^))  =  1  is  the  equation  for  r.  We  can  now  use  (11.55)  to  check  for 
errors  in  our  calculations. 

Once  r  and  s  have  been  found  it  is  a  fairly  straightforward  matter  to  apply  (11.50).  The  only 
issue  that  may  cause  some  difficulty  is  the  evaluation  of 

F^{r,s)  =   ^-^  ^-^+r^T'{r^)s  +  r^T'{r^) 

D 

because  of  the  presence  of  T'.  We  can  differentiate  (11.54)  with  respect  to  x  and  solve  the  result  for 
T'{x)  in  terms  of  x,  T'{x^),  T'{x^)  and  values  of  T.  Using  this  recursively  we  can  evaluate  T'  to  any 
desired  degree  of  accuracy,  much  as  we  can  T.  After  considerable  calculation,  we  find  that 

tn  ~  (0.51788... )n-3/2(Q_355igi7  ;)-n 
which  can  be  proved.  Since  we  have  the  results 


n 

5 

10 

20 

30 

40 

50 

(11.57) 

8.2 

512.5 

5671108. 

9.661  X  10^° 

1.964  X  10^5 

4.398  X  10^9 

in 

8 

507 

5622109 

9.599  X  10^° 

1.954  X  IQi'^ 

4.380  X  10^9 

ratio 

1.02 

1.01 

1.009 

1.006 

1.005 

1.004 

the  estimate  is  a  good  approximation  even  for  small  n.  D 
Exercises 


In  this  set  of  exercises,  "estimate"  always  means  asymptotically;  i.e.,  for  large  n.  Since  you  will  be  using  the 
various  principles  in  the  text,  your  results  will  only  be  what  is  probably  true;  however,  the  results  obtained 
will,  in  fact,  be  correct.  Some  problems  ask  you  to  use  alternate  methods  to  obtain  asymptotic  estimates. 
We  recommend  that,  after  doing  such  a  problem,  you  reflect  on  the  amount  of  work  each  method  requires 
and  the  accuracy  of  the  result  it  produced.  In  many  of  the  exercises,  enumeration  formulas  are  given.  You 
need  not  derive  them  unless  you  are  asked  to  do  so. 

11.4.1.  A  path  enumeration  problem  leads  to  the  recursion  an  =  2o„_i  +  a„_2  for  n  >  2  with  initial 
conditions  oq  =  1  and  oi  =  3. 

(a)  Estimate  On  directly  from  the  recursion  using  Principle  11.1  (p.  341). 

(b)  Determine  the  ordinary  generating  function  and  use  (11.44)  to  estimate  an- 

(c)  Use  the  ordinary  generating  function  to  obtain  an  explicit  formula  for  an  and  use  this  formula 

to  estimate  an- 
il A. 2.  The  recursions 

Un   =  nUn-1  +  2Un-2  -  (n  -  4)f/„_3  -  Un-i 
Vn   =   {n-l){Vn-l+Vn-2)  +  Vn-3 

arise  in  connection  with  the  "menage"  problems.  Estimate  Un  and  Vn  from  the  recursions. 

11.4.3.  In  Example  7.13  (p.  211),  an  upper  bound  was  found  for  the  running  time  to  merge  sort  a  2''^-long 
list.  Show  that  the  running  time  to  merge  sort  an  n-long  list  is  O(nlog  n). 
Note:  This  is  for  general  n,  not  just  n  =  2*^  and  it  is  for  any  list,  not  just  worst  case. 
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11.4.4.  Let  an  be  the  number  of  permutations  /  of  n  such  that  /'^(s)  =  x  for  all  x  €  n. 


(a)  Show  that 

an 


d\k  ^  ' 


where  "d[fe"  beneath  the  summation  sign — read  "d  divides  fc" — means  that  the  sum  is  over  all 
positive  integers  d  such  that  fe/d  is  an  integer, 
(b)  Estimate  an  from  the  recursion. 

11.4.5.  A  functional  digraph  is  a  simple  digraph  in  which  each  vertex  has  outdegree  1.  (See  Exercise  11.2.17.) 

Wo  say  it  is  connected  if  the  associated  graph  is  connected. 

(a)  The  number  of  labeled  connected  n-vertex  functional  digraphs  is 


E 

k=l 


{n-ky.  ' 


Obtain  the  estimate  y/Trri/2  n"'^^  for  this  summation, 
(b)  The  average  number  of  components  in  a  labeled  n-vertex  functional  digraph  is 


Em 
kn''(n  —  kV. 
fe=i       ^  ' 

Obtain  an  estimate  for  this  summation. 
11.4.6.  Let  an  be  the  number  of  partitions  of  n  into  ordered  blocks. 

(a)  Show  that  En'*"^"/'^'^'  =  (2  - 

(b)  Estimate  On  from  the  generating  function. 
*(c)  By  expanding  (1  —  e^/2)~^,  show  that 


On 


and  use  this  summation  to  estimate  an 


E  2^+1 
fc=l 


11.4.7.  Let  5  be  a  set  of  positive  integers  and  let  an  be  the  number  of  permutations  /  of  n  such  that  none 
of  the  cycles  of  /  have  a  length  in  S.  Then 


OnX^/nl  =       — exp(  —  x^/fc 


n>0  ^  kes 

When  S  is  a  finite  set,  estimate  an- 

11.4.8.  Let  an  be  the  number  of  labeled  simple  n-vertex  graphs  all  of  whose  components  are  cycles. 

(a)  Show  that 

En,  I        exp(-x-/2  -  x^/4) 
anx    n\  =   ^ — '  — —. 

n>0  ^ 

(b)  Obtain  an  estimate  for  a„. 

11.4.9.  The  EGFs  for  permutations  all  of  whose  cycles  are  odd  is  Ao(x)  =  ^         and  that  for  permutations 
all  of  whose  cycles  are  even  is  Ae{x)  =  (1  — 

(a)  Why  can't  our  principles  for  generating  functions  be  used  to  estimate  the  coefficients  of  Ao{x) 

and  Ae{x)l 

(b)  If  you  apply  the  relevant  principle,  it  will  give  the  right  answer  anyway  for  Ao{x)  but  not  for 
Ae{x).  Apply  it. 

(c)  Show  that  Oe,2n+i  =  0  and  ae,2n  ~  (^^)(2n)!4~"  and  then  use  Stirling's  formula. 

(d)  Use  Ao{x)  =  (1  -|-  x)(l  —  a;^)~^/^  to  obtain  formulas  for  ao,n- 
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11.4.10.  Let  5  be  a  set  of  positive  integers  and  let  S'  be  those  positive  integers  not  in  S.  Let  Cn{S)  be  the 
number  of  compositions  of  n  all  of  whose  parts  lie  in  S,  with  co{S)  =  1. 

(a)  Derive  the  formula 

oo 

(b)  Explain  how  to  derive  asymptotics  for  Cn{S)  when  S  is  a  finite  set. 

(c)  Explain  how  to  derive  Eisymptotics  for  Cn{S)  when  S'  is  a  finite  set. 

11.4.11.  A  "path"  of  length  n  is  a  sequence  0  =  uq,  mi,  ...,««  =  0  of  nonnegative  integers  such  that  Wfe+i  — 
Mfe  £  {  —  1,  0, 1}  for  k  <  n.  The  ordinary  generating  function  for  such  paths  by  length  is 

1 

Vl  -  2x  -  3x2  • 

Estimate  the  number  of  such  paths  by  length.  (See  Exercise  10.3.2.) 

11.4.12.  In  Exercise  10.4.7  (p.  300)  we  showed  that  the  generating  function  for  an  unlabeled  binary  RP-tree 
by  number  of  vertices  is  (1  —  x  —  Vl  —  2x  —  3x'^)/2x.  Estimate  the  coefficients. 

11.4.13.  A  certain  kind  of  unlabeled  RP-trees  have  the  generating  function 

1+x'^  -  -v/Cl  +  x2)2  _  4x 
2 

when  counted  by  vertices.  Estimate  the  number  of  such  trees  with  n  vertices. 

11.4.14.  Show  that  the  EGF  for  permutations  with  exactly  k  cycles  is 

and  use  this  to  estimate  the  number  of  such  permutations  when  k  is  fixed  and  n  is  large. 

11.4.15.  Let  hn  be  the  number  of  unlabeled  n- vertex  RP-trees  in  which  no  vertex  has  outdegree  1  and  let 
H{x)  be  the  ordinary  generating  function.  Show  that 

H{xf-H{x)  +  j^  =  0 

and  use  this  to  estimate  hn- 

11.4.16.  Let  hn  be  the  number  of  rooted  labeled  n-vertex  trees  for  which  no  vertex  has  outdegree  2.  It  can 
be  shown  that  the  EGF  H(x)  satisfies  (1  +  x)H{x)  =  xe^^''^  Estimate  hn- 

11.4.17.  Let  _D  be  a  set  of  positive  integers.  Let  an  be  the  number  of  functions  /  from  n  to  the  positive 
integers  such  that,  (a)  if  /  takes  on  the  value  k,  then  it  takes  on  the  value  i  for  all  0  <  i  <  fc,  and 
(b)  if  /  takes  on  the  value  k  the  number  of  times  it  does  so  lies  in  D.  It  can  be  shown  that  the  EGF 
for  the  On's  is 

A{x)  =  (l  -  E  ^'/^O 

For  whatever  D's  you  can,  show  how  to  estimate  an- 

11.4.18.  Let  the  ordinary  generating  function  for  the  number  of  rooted  unlabeled  n-vertex  trees  in  which 

every  vertex  has  outdegree  at  most  2  be  T(x).  For  convenience,  set  to  =  1-  It  can  be  shown  that 
T(x)  =  1  +x{T{xf  +  r(a;^))/2.  Estimate  the  numbers  of  such  trees. 
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*11.4.19.    The  results  here  relate  to  Exercise  10.4.1  (p.  298).  Wc  suppose  that  A{x)  satisfies  Principle  11.6  and 
that  oq  =  0.  Our  notation  will  be  that  of  Principle  11.6  and  wc  assume  that  6  =  0  for  simplicity. 

(a)  Suppose  that  c  <  0.  Let  B{x)  =  A{x)^.  Show  that  we  expect 
bn/a„  ~  (ff(r)n-'=)'=-ir(-c)/r(-c/c). 

(b)  Suppose  that  c  >  0  and  h{r)  =^  0.  Let  B[x)  =  A{x)''.  Show  that  we  expect 
bn/an  ~  kh{r)'^~'^. 

(c)  Suppose  that  c  =  1/2  and  A{r)  <  1.  Let  B{x)  =  (1  -  ^(a;))"^  Derive 

1  -  h{x)  +  f{x)g{x) 


B{x) 


{l-hix))"^ -{l-x/r)g{xf 


and  use  this  to  show  that  we  expect  bn/an  ~  (l  —  h(r))  ^.  You  may  assume  that  the  denominator 

(1  —  h{x))'^  —  (1  —  x/r)g[x)^  does  not  vanish  on  the  interval  [— r,  r]. 

(d)  Suppose  that  A[x)  =  1  has  a  solution  s  G  (0,  r).  Show  that  it  is  unique.  Let  B(x)  =  {1  —  A(x))~^ . 
Show  that  we  expect  hn  ~  1/(^4' (5)8""*"^).  Prove  that  the  solution  A{s)  =  1  will  surely  exist  if 
c  <  0. 

(e)  Suppose  r  <  1.  It  can  be  shown  that  the  radii  of  convergence  of 

J2Mx'')/k       and  ^(-l)'=-^A(a;'=)/fc 

k>2  k>2 

both  equal  1.  Explain  how  we  could  use  this  fact  to  obtain  asymptotics  for  sets  and  multisets  in 
Exercise  10.4.1  (p.  298)  using  Principle  11.6,  if  we  could  handle  e^*-^'  using  the  principle. 

11.4.20.  Recall  that  the  generating  function  for  unlabeled  full  binary  RP-trees  by  number  of  leaves  is 

1  -  yi  -  4a; 
2 

In  the  following,  Exercise  11.4.19  will  be  useful. 

(a)  Use  Exercise  10.4.1  (p.  298)  to  deduce  information  about  lists  of  such  trees. 

*(b)  Use  Exercise  10.4.1(c,d)  to  deduce  information  about  sets  and  multisets  of  such  trees. 
Hint.  Show  that 

i     R  -yr^^^--    (1-4^)*^  ^--(l-4a;)'= 

exp(-Vr^/2)  =  -^E22^(2fc  +  l)!+EV(2fc)r- 

fe>0  A:>0 


11.4.21.  Let  D  be  a  set  of  nonncgativc  integers  containing  0.  Let  tn  be  the  number  of  rooted  labeled  n-vertex 
trees  in  which  the  outdcgrcc  of  every  vertex  lies  in  D.  Let  T(x)  the  EGF. 

(a)  Show  that  T{x)  =  a;  ^  T[xfld\. 

den 

*(b)  Let  k  =  gcA{D).  Show  that  tn  =  Q  when  n  +  1  is  not  a  multiple  of  k. 
(c)  For  finite  D  with  gcA{D)  =  1,  show  how  to  estimate  tn- 

*11.4.22.  For  each  part  of  Exercise  11.2.2  except  (c),  discuss  what  sort  of  information  about  E-j-  would  be 
useful  for  estimating  the  coefficients  of  the  exponential  generating  function  given  there.  (See  Exer- 
cise 11.4.19.) 
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Notes  and  References 


The  text  by  Sedgewick  and  Flajolet  [15]  covers  some  of  the  material  in  this  chapter  and  Chap- 
ter 10,  and  also  contains  related  material. 

Further  discussion  of  exponential  generating  functions  can  be  found  in  many  of  the  references 
given  at  the  end  of  the  previous  chapter.  Other  generating  functions  besides  ordinary  and  exponential 
ones  play  a  role  in  mathematics.  Dirichlet  series  play  an  important  role  in  some  aspects  of  analytic 
number  theory.  Apostol  [1]  gives  an  introduction  to  these  series;  however,  some  background  in 
number  theory  or  additional  reading  in  his  text  is  required.  In  combinatorics,  almost  all  generating 
functions  arc  ordinary  or  exponential.  The  next  most  important  class,  Gaussian  generating  functions, 
is  associated  with  vector  spaces  over  finite  fields.  Goldman  and  Rota  [7]  discuss  them. 

Lagrange  inversion  can  be  regarded  as  a  theorem  in  complex  analysis  or  as  a  theorem  in 
combinatorics.  In  either  case,  it  can  be  generalized  to  a  set  of  simultaneous  equations  in  several 
variables.  Garsia  and  Shapiro  [6]  prove  such  a  generalization  combinatorially  and  give  additional 
references.  For  readers  familiar  with  complex  analysis,  here  is  a  sketch  of  an  analytic  proof.  Let 
^a„a;"  =  A{x)  =  g{T{x)).  By  the  Cauchy  Residue  Theorem  followed  by  a  change  of  variables, 

1     f  A'{x)dx  _  J_  /•  g'{T{x))T'{x)  dx  _  J_  /"  g'{T)dT 
~  2i:iJ  ~  2m  J  x^  "  2m]  {Tlf{T)Y' 

which  equals  the  coefficient  of         in  g'{u)f{u)^  by  the  Cauchy  Residue  Theorem. 

Tutte's  work  on  rooted  maps  (Exercise  11.2.18)  was  done  in  the  1960s.  Connections  with  his 
work  and  the  (asymptotic)  enumeration  of  polyhedra  are  discussed  in  [3] . 

Polya's  theorem  and  some  generalizations  of  it  were  first  discovered  by  Redfield  [14]  whose  paper 
was  overlooked  by  mathematicians  for  about  forty  years.  A  translation  of  Polya's  paper  together 
with  some  notes  is  available  [13].  DeBruijn  [4]  give  an  excellent  introduction  to  Polya's  theorem  and 
some  of  its  generalizations  and  applications.  Harary  and  Palmer  [9]  discuss  numerous  applications 
in  graph  theory. 

Textbooks  on  combinatorics  generally  avoid  asymptotics.  Wilf  [16,  Ch.  5]  has  a  nice  introduction 
to  asymptotics  which,  in  some  ways,  goes  beyond  ours.  Books,  such  as  the  one  by  Greene  and 
Knuth  [8],  that  deal  with  analysis  of  algorithms  may  have  some  material  on  asymptotics.  If  you  are 
interested  in  going  beyond  the  material  in  this  text,  you  should  probably  look  at  journal  articles. 
The  article  by  Bender  [2]  is  an  introduction  to  some  methods,  including  ways  to  deal  with  the  misfits 
in  Example  11.32  (p.  353).  (You  should  note  that  the  hypotheses  of  Theorem  5  are  too  weak.  A  much 
more  extensive  discussion  has  bcien  given  by  [12].  This  has  been  corrected  by  Mcir  and  Moon  [11].) 
Our  first  principle  for  generating  function  asymptotics  was  adapted  from  the  article  by  Flajolet  and 
Odlyzko  [5].  Polya  [13]  discusses  computing  asymptotics  for  several  classes  of  chemical  compounds. 
A  method  for  dealing  with  various  types  of  trees,  from  combinatorial  description  to  asymptotic 
formula,  is  discussed  by  Harary,  Robinson  and  Schwenk  [10]. 
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APPENDIX  A 

Induction 


Let's  explore  the  idea  of  induction  a  bit  before  getting  into  the  formahties. 

Example  A.l  A  simple  inductive  proof  Suppose  that  A{n)  is  an  assertion  that  depends  on 
n.  For  example,  take  A{n)  to  be  the  assertion  "n!  >  2"".  In  attempting  to  decide  whether  or  not 
A{n)  is  true,  we  may  first  try  to  check  it  for  some  small  values  of  n.  In  this  example,  ^(1)  =  "1!  >  2^" 

is  false,  ^(2)  =  "2!  >  2^"  is  false,  and  Ai3)  =  "3!  >  2^"  is  false;  but  ^(4)  =  "4!  >  2^"  is  true.  We 
could  go  on  like  this,  checking  each  value  of  n  in  turn,  but  this  becomes  quite  tiring. 

If  we're  smart,  we  can  notice  that  to  show  that  ^(5)  is  true,  we  can  take  the  true  statement 
^(4)  =  "4!  >  2^"  and  multiply  it  by  the  inequality  5  >  2  to  get  5!  >  2^.  This  proves  that  ^(5)  is 
true  without  doing  all  of  the  nudtiplications  necessary  to  verify  A{5)  directly.  Since  6  >  2  and  A{5) 
is  true,  we  can  use  the  same  trick  to  verify  that  ^(6)  is  true.  Since  7  >  2,  we  could  use  the  fact  that 
^(6)  is  true  to  show  that  ^(7)  is  true.  This,  too,  becomes  tiring. 

If  we  think  about  what  we  are  doing  in  a  little  more  generality,  we  see  that  if  we  have  verified 
A{n  —  1)  for  some  value  of  n  >  2.  wc  can  combine  this  with  n  >  2  to  verify  A{n).  This  is  a  "generic" 
or  "general"  description  of  how  the  validity  of  A{n  —  1)  can  be  transformed  into  the  validity  of  A{n). 
Having  verified  that  ^(4)  is  true  and  given  a  valid  generic  argument  for  transforming  the  validity 
of  A{n  —  1)  into  the  validity  of  A{n),  we  claim  that  the  statement  "n!  >  2"  if  n  >  3"  has  been 
proved  by  induction.  We  hope  you  believe  that  this  is  a  proof,  because  the  alternative  is  to  list  the 
infinitude  of  cases,  one  for  each  n.  Q 

Here  is  a  formulation  of  induction: 

Theorem  A.l  Induction  Let  A{m)  be  an  assertion,  the  nature  of  which  is  dependent  on 
the  integer  m.  Suppose  that  we  have  proved  A{n)  for  no  <n  <ni  and  the  statement 

"If  n>  n\  and  A{k)  is  true  for  ail  k  such  that  ni  <  k  <  n,  then  A{n)  is  true." 

Then  A{m)  is  true  for  all  m>  uq. 

Definition  A.l  The  statement  "A{k)  is  true  for  all  k  such  that  ni  <  k  <  n"  is  called  the 
induction  assumption  or  induction  hypothesis  and  proving  that  this  implies  A{n)  is  called 
the  inductive  step.  The  values  of  n  with  no  <  n  <  rii  are  called  the  base  cases. 
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In  many  situations,  tiq  =  ni  and  ni  =  0  or  1;  however,  this  is  not  always  true.  In  fact,  our 
example  requires  a  different  value  of  ni.  Before  reading  further,  can  you  identify  ni  in  Example  A.l? 
Can  you  identify  the  induction  assumption? 


*        *        *       Stop  and  think  about  this!        *        *  * 

In  our  example,  we  started  by  proving  .4(4)  =  "4!  >  2"*" ,  so  rii  =  4.  The  induction  assumption  is: 

k\  >  2^  is  true  for  all  k  such  that  4  <  A;  <  n. 

Remark.  Sometimes  induction  is  formulated  differently.  One  difference  is  that  people  sometimes 
use  n  +  1  in  place  of  n.  Thus  the  statement  in  the  theorem  would  be 

"If  n>n\  and         is  true  for  all  n\  <k  <n,  then  A(ji  +  1)  is  true." 

Another  difference  is  that  some  people  always  formulate  it  with  no  =  1 .  Finally,  some  people  restrict 
the  definition  of  induction  even  further  by  allowing  you  to  use  only  A{n)  to  prove  A(n  +  1)  (or, 
equivalently,  only  A{n  —  1)  to  prove  A{n)),  rather  than  the  full  range  of  A{k)  for  no  <  k  <  n. 
Putting  these  changes  together,  we  obtain  the  following  more  restrictive  formulation  of  induction 
that  is  found  in  some  texts: 


Corol  lary  Restricted  i  nd  UCtion  Let  A{m)  be  an  assertion,  the  nature  of  which  is  dependent 
on  the  integer  m.  Suppose  that  we  have  proved  A(l)  and  the  statement 

"If  n  >  1  and  A{n  —  1)  is  true  then,  A{n)  is  true." 
Then  A{m)  is  true  for  all  m  >  1. 


Since  this  is  a  special  case  of  Theorem  A.l,  we'll  just  simply  use  Theorem  A.l.  While  there  is  never 
need  to  use  a  simpler  form  of  induction  than  Theorem  A.l,  there  is  sometimes  a  need  for  a  more 
general  forms  of  induction.  Discussion  of  these  generalizations  is  beyond  the  scope  of  the  text. 


Example  A. 2  A  summation  We  would  like  a  formula  for  the  sum  of  the  first  n  integers.  Let 
us  write  S{n)  =  1  +  2  +  . . .  +  n  for  the  value  of  the  sum.  By  a  little  calculation, 

5(1)  =  1,   5(2)  =3,   5(3)  =  6,   5(4)  =  10,   5(5)  =  15,   5(6)  =  21. 

What  is  the  general  pattern?  It  turns  out  that  5(n)  =  "^"2*"^^  correct  for  1  <  n  <  6.  Is  it  true  in 
general?  This  is  a  perfect  candidate  for  an  induction  proof  with 


no 


=  m  =  1       and       A{n)  :     "5(n)  =  nin^-  A.l 


Let's  prove  it.  We  have  shown  that  .4(1)  is  true.  In  this  case  we  need  only  the  restricted  induction 
hypothesis;  that  is,  we  will  prove  the  formula  for  5(n)  by  using  the  formula  for  5(n  —  1).  Here  it  is 
(the  inductive  step): 

5(n)  =  l  +  2  +  --  -  +  n  =  (l  +  2  +  --  -  +  (n-l))+n 

(n- l)((n- 1)  +  1) 
=  5(n-l)+n  =   '-^  ^+n  by.4(n-l), 


n(n  +  1) 
~  2 

This  completes  the  proof.  Q 


by  algebra. 
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Example  A. 3    Divisibility    We  will  prove  that  for  all  integers  X  >  1  and  all  positive  integers  n, 

X  —  1  divides  x"  —  1.  In  this  case  no  =  ni  =  1,  A{n)  is  the  statement  that  x  —  1  divides  x"  —  1.  Since 
■4(1)  states  that  x  —  1  divides  a;  —  1  it  is  obvious  that  .4(1)  is  true.  Now  for  the  induction  step.  How 
can  we  rephrase  A{n)  so  that  it  involves  A{n  —  1)?  After  a  bit  of  thought,  you  may  come  up  with 

a;"-l  =  a;(a;"-^-l)  +  (a;-l).  A.2 

Once  this  is  discovered,  the  rest  is  easy: 

•  By  A{n  —  1),  x"~^  —  1  is  divisible  by  a;  —  1,  thus  x{x"~^  —  1)  is  divisible  by  a;  —  1. 

•  Also,  X  —  1  divides  x  —  1. 

•  Thus  the  sum  in  (A.2)  is  divisible  by  a;  —  1. 

This  completes  the  proof. 

We  could  have  used  the  same  proof  as  this  one  for  the  statement  that  the  polynomial  a;  —  1 
divides  the  polynomial  a;"  —  1  for  all  positive  integers  n.  Q 

In  the  last  example,  the  hardest  part  was  figuring  out  how  to  use  A{n  —  1)  in  the  proof  of  A{n). 
This  is  quite  common: 

Observation  The  difEcult  part  of  a  proof  by  induction  is  often  figuring  out  how  to  use  the 
inductive  hypothesis. 

Simple  examples  of  inductive  proofs  may  not  make  this  clear.  Another  difficulty  that  can  arise,  as 
happened  in  Example  A.2,  may  be  that  we  are  not  even  given  a  theorem  but  are  asked  to  discover  it. 

Example  A. 4   An  integral  formula   We  want  to  prove  the  formula 

'\''{l-xfdx 


/ 

Jo 


'o      ^        '  {a  +  b+iy. 

for  nonncgativc  integers  a  and  b. 

What  should  we  induct  on?  We  could  choose  a  or  b.  Let's  use  b.  Here  it  is  with  b  replaced  with 
n  to  look  more  like  our  usual  formulation: 

/   a;"(l-a;)"da;  =  ,  „.  A.3 

Jo  (a  +  n  +  1)! 

This  is  our  A{n).  Also  no  =  ni  =  0,  the  smallest  nonnegative  integer.  You  should  verify  that  ^(0) 
is  true,  thereby  completing  the  first  step  in  the  inductive  proof.  (Remember  that  0!  =  1.) 

How  can  we  use  our  inductive  hypothesis  to  prove  A{n)7  This  is  practically  equivalent  to  asking 
how  to  manipulate  the  integral  in  (A.3)  so  that  the  power  of  (1  —  x)  is  reduced.  Finding  the  answer 
depends  on  a  knowledge  of  techniques  of  integration.  Someone  who  has  mastered  them  will  realize 
that  integration  by  parts  could  be  used  to  reduce  the  degree  of  1  —  a;  in  the  integral.  Let's  do  it.  In- 
tegration by  parts  states  that  J  udv  =  uv  —  J  v  du.  We  set  u  =  (1  —  a;)"  and  dv  =  x^dx.  Then 

du  =  — n(l  —  a;)"~^,  v  =  and 

/■I  (-1  _  a+l   1  n 

/   x"(l-x)"dx  =  ^   /  a;'^+'(l-a;)"-^da; 

Jo  a+1       0     a+Uo  ^ 

=  [  x^+^l-xr-^dx. 

a+1  Jo         ^  ' 

The  last  integral  can  be  evaluated  by  A{n  —  1)  and  so 

n         {a  +  l)\{n-l)\  a\n\ 


x^il-xTdx 


0 


a  +  1  ((a  +  1)  +  (n  -  1)  +  1)!        (a  +  n  +  1)! 
This  completes  the  inductive  step,  thereby  proving  (A.3).  Q 
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We  now  do  a  more  combinatorial  inductive  proof.  This  requires  a  definition  from  the  first  chap- 
ter. If  A  is  a  set,  let  \A\  stand  for  the  number  of  elements  in  A. 

Definition  A. 2  Cartesian  Product  If  Ci,...,Ck  are  sets,  the  Cartesian  product  of  the 
sets  is  written  Ci  x  •  •  •  x  Cfe  and  consists  of  all  k  long  lists  {xi, . . .  ,Xk)  with  Xi  G  Cj  for 
l<i<k. 

Example  A. 5   Size  of  the  Cartesian  product    Wc  want  to  prove 

|CiX-..xC„|  =  A.4 

This  is  our  A{n).  Since  this  is  trivially  true  for  n  =  1,  it  is  reasonable  to  take  no  =  ni  =  1.  It  turns 
out  that  this  choice  will  make  the  inductive  step  more  difficult.  It  is  better  to  choose  ni  =  2.  (Nor- 
mally one  would  discover  that  part  way  through  the  proof.  To  simplify  things  a  bit,  we've  "cheated" 
by  telling  you  ahead  of  time.) 

To  begin  with,  we  need  to  prove  -4(2).  How  can  this  be  done?  This  is  the  difficult  part.  Let's 
postpone  it  for  now  and  do  the  inductive  step. 

We  claim  that 

the  sets  Ci  x  •  •  •  x  C„  and  (Ci  x  •  •  •  x  C„_i)  x  C„  have  the  same  number  of  elements. 

Why  is  this?  They  just  differ  by  pairs  of  parentheses:  Suppose  xi  &  Ci,  X2  S  C2,  ...  and  a;„  e  C„. 
Then 

{xi,...,x„)  e  Ci  X  •••  X  C„ 

and 

)    e   (Ci  X  •••  X  C„_i)  X  Cn 

just  differ  by  pairs  of  parentheses.  Thus 

|Ci  X  •  •  •  X  C„|  =  |(Ci  X  •  •  •  X  C„_i)  X  C„|  by  the  above, 

=  |Ci  X  •••  xC„_i|  •  \Cn\  hyA{2), 

=  (|Ci|---|Cn-i|)|C„|  by^(n-l). 

This  completes  the  inductive  step.  Note  that  this  is  different  from  our  previous  inductive  proofs  in 
that  we  used  both  A{n  —  1)  and  ^(2)  in  the  inductive  step. 

We  must  still  prove  ^(2).  Let  Ci  —  {j/i, . . .  ,yk},  where  k  =  \Ci\.  Then 

Ci  X  C2  =  {{yi}  X  C2)  U  •  •  •  U  {{vk}  X  C2). 

Since  all  of  the  sets  in  the  union  are  disjoint,  the  number  of  elements  in  the  union  is  the  sum  of  the 
number  of  elements  in  each  of  the  sets  separately.  Thus 

|Ci  X  C2I  =  \{yi}  X  C2\  +  ■  ■  ■  +  \{yk}  x  C2I. 

You  should  be  able  to  see  that  \{yi}  x  C2I  =  IC2I.  Since  the  sum  has  k  =  \Ci\  terms,  all  of  which 
equal  IC2I,  we  finally  have  |Ci  x  C2I  =  |Ci|  •  |C2|.  □ 
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Our  final  example  requires  more  than  -4(2)  and  A{n  —  1)  to  prove  A{n). 

Example  A. 6   Every  integer  is  a  product  of  primes  A  positive  integer  n  >  1  is  called  a 

prime  if  its  only  divisors  are  1  and  n.  The  first  few  primes  are  2,  3,  5,  7,  11,  13,  17,  19,  23.  We  now 
prove  that  every  integer  n  >  1  is  a  product  of  primes,  where  we  consider  a  single  prime  p  to  be  a 
product  of  one  prime,  itself.  We  shall  prove  A{n): 


We  start  with  uq  =  ni  =  2,  which  is  a  prime  and  hence  a  product  of  primes.  Assume  the  induc- 
tion hypothesis  and  consider  A{n).  If  n  is  a  prime,  then  it  is  a  product  of  primes  (itself).  Otherwise, 

n  has  a  divisor  s  with  1  <  s  <  n  and  so  n  =  st  where  1  <  t  <  n.  By  the  induction  hypotheses  A{s) 
and  A(t),  s  and  t  are  each  a  product  of  primes,  hence  n  =  st  is  a  product  of  primes.  This  completes 
the  proof  of  A{n)  D 

In  all  except  Example  A. 5,  we  used  algebra  or  calculus  manipulations  to  arrange  A{n)  so  that 
we  could  apply  the  inductive  hypothesis.  In  Example  A. 5,  we  used  a  set  theoretic  argument:  We 
found  that  the  elements  in  the  set  Ci  x  •  •  •  x  C„  could  be  put  into  one  to  one  correspondence  with 
the  elements  in  the  set  (Ci  x  •  •  •  x  C„_i)  x  C„.  This  sort  of  set-theoretic  argument  is  more  common 
in  combinatorial  inductive  proofs  than  it  is  in  noncombinatorial  ones. 

Exercises 


In  these  exercises,  indicate  clearly 

(i)  what  no,  ni  and  A{n)  are, 

(ii)  what  the  inductive  stop  is  and 

(iii)  where  the  inductive  hypothesis  is  used. 

A.l.  Prove  that  the  sum  of  the  first  n  odd  numbers  is  n^. 
A.2.  Prove  that  J2k=o     =       +  1)(2«  +        for  fc  >  0. 

A. 3.  Conjecture  and  prove  the  general  formula  of  which  the  following  are  special  cases: 

1-4  +  9-16  =  -(1  +  2  +  3  +  4) 
1-4  +  9-16  +  25  =  1  +  2  +  3  +  4  +  5. 

*A.4.  Conjecture  and  prove  the  general  formula  of  which  the  following  are  special  cases: 


■Every  integer  n  >  2  is  a  product  of  primes. 


X     .  2x^ 


X 


1  +  X     1  +  a;2 


.2 


1-x 


X  2x^  4x^ 


X 


l  +  a;'^l+a;2  +  i  +  a;4 


1-x 


1  —  X- 


.8 


A.5, 


Some  calculus  texts  omit  the  proof  of  (a:")'  =  na;"  ^  or  slide  over  the  inductive  nature  of  the  proof. 
Give  a  proper  inductive  proof  for  n  >  1  using  x'  =  1  and  the  formula  for  the  derivative  of  a  prod- 
uct. 


A.6, 


Conjecture  and  prove  a  formula  for 

grals.) 


oo 


a;"e  ^da;  for  n  >  0.  (Your  answer  should  not  include  inte- 
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A. 7.  What  is  wrong  with  the  following  inductive  proof  that  all  people  have  the  same  sex?  Let  A{n)  be  "In 
any  group  of  n  people,  all  people  are  of  the  same  sex."  This  is  obviously  true  for  n  =  1.  For  n  >  1, 
label  the  people  Pi, . .  .  ,  P„-  By  the  induction  assumption,  Pi, ...  ,  P„_i  arc  all  of  the  same  sex  and 
P2, . . . ,  Pn  are  all  of  the  same  sex.  Since  P2  belongs  to  both  groups,  the  sex  of  the  two  groups  is  the 
same  and  so  we  are  done. 

A.8.  We  will  show  by  induction  that  1  +  2  +  •  •  •  +  n  =  (2n  +  1)^/8.  By  the  inductive  hypothesis, 

1  +  2  H  h  (n  -  1)  =  (2n  -  1)^/8.  Adding  n  to  both  sides  and  using  n  +  (2n  -  1)^/8  =  (2n  +  l)/8, 

we  obtain  the  formula.  What  is  wrong  with  this  proof? 

A.9.  Imagine  drawing  n  distinct  straight  lines  so  as  to  divide  the  plane  into  regions  in  some  fashion.  Prove 
that  the  regions  can  be  assigned  numbers  0  and  1  so  that  if  two  regions  share  a  line  segment  in  their 
boundaries,  then  they  are  numbered  differently. 


APPENDIX  B 

Rates  of  Growth 

and 

Analysis  of  Algorithms 


Suppose  we  have  an  algorithm  and  someone  asks  us  "How  good  is  it?"  To  answer  that  question, 
we  need  to  know  what  they  mean.  They  might  mean  "Is  it  correct?"  or  "Is  it  understandable?"  or 
"Is  it  easy  to  program?"  We  won't  deal  with  any  of  these. 

They  also  might  mean  "How  fast  is  it?"  or  "How  much  space  docs  it  need?"  These  two 
questions  can  be  studied  by  similar  methods,  so  we'll  just  focus  on  speed.  Even  now,  the  ques- 
tion is  not  precise  enough.  Does  the  person  mean  "How  fast  is  it  on  this  particular  problem 
and  this  particular  machine  using  this  particular  code  and  this  particular  compiler?"  We  could 
answer  this  simply  by  running  the  program!  Unfortunately,  that  doesn't  tell  us  what  would 
happen  with  other  machines  or  with  other  problems  that  the  algorithm  is  designed  to  han- 
dle. 

We  would  like  to  answer  a  question  such  as  "How  fast  is  Algorithm  1  for  finding  a  span- 
ning tree?"  in  such  a  way  that  we  can  compare  that  answer  to  "How  fast  is  Algorithm  2  for 
finding  a  spanning  tree?"  and  obtain  something  that  is  not  machine  or  problem  dependent.  At 
first,  this  may  sound  like  an  impossible  goal.  To  some  extent  it  is;  however,  quite  a  bit  can  be 
said. 

How  do  we  achieve  machine  independence?  We  think  in  terms  of  simple  machine  operations  such 
as  multiplic;ation,  fetching  from  memory  and  so  on.  If  one  algorithm  uses  fewer  of  these  than  an- 
other, it  should  be  faster.  Those  of  you  familiar  with  computer  instruction  timing  will  object  that 
different  basic  machine  operations  take  different  amounts  of  time.  That's  true,  but  the  times  are 
not  wildly  different.  Thus,  if  one  algorithm  uses  a  lot  fewer  operations  than  another,  it  should  be 
faster.  It  should  be  clear  from  this  that  we  can  be  a  bit  sloppy  about  what  we  call  an  operation; 
for  example,  we  might  call  something  like  x  =  a  +  b  one  operation.  On  the  other  hand,  we  can't 

be  so  sloppy  that  we  call  x  =  ai  +  h  an  one  operation  if  n  is  something  that  can  be  arbitrarily 

large. 

Suppose  we  have  an  algorithm  for  a  class  of  problems.  If  we  program  the  algorithm  in  some  lan- 
guage and  use  some  compiler  to  produce  code  that  we  run  on  some  machine,  then  there  is  a  function 
f{n)  that  measures  the  average  (or  worst,  if  you  prefer)  running  time  for  the  program  on  problems 
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of  size  n.  We  want  to  study  how  f{n)  grows  with  n,  but  we  want  to  express  our  answers  in  a  man- 
ner that  is  independent  of  the  language,  compiler,  and  computer.  Mathematicians  have  introduced 
notation  that  is  quite  useful  for  studying  rates  of  growth  in  this  manner.  We'll  study  the  notation 
in  this  appendix. 


B.l   The  Basic  Functions 


Example  B.l  Let's  look  at  how  long  it  takes  to  find  the  maximum  of  a  list  of  n  integers  where 
we  know  nothing  about  the  order  they  are  in  or  how  big  the  integers  are.  Let  ai, . . . ,  o„  be  the  list 
of  integers.  Here's  our  algorithm  for  finding  the  maximum. 

max  =  ai 

For  i  =  2, . . . ,  n 

If  ai  >  max,  then  max  =  ai . 

End  for 
Return  max 

Being  sloppy,  we  could  say  that  the  entire  comparison  and  replacement  in  the  "If"  takes  an 

operation  and  so  docs  the  stepping  of  the  index  i.  Since  this  is  done  n  —  1  times,  we  get  2n  —  2 
operations.  There  are  some  setup  and  return  operations,  say  s,  giving  a  total  of  2n  —  2  +  s  op- 
erations. Since  all  this  is  rather  sloppy  all  we  can  really  say  is  that  for  large  n  and  actual  code 
on  an  actual  machine,  the  procedure  will  take  about  Cn  "ticks"  of  the  machine's  clock.  Since  we 
can't  determine  C  by  our  methods,  it  will  be  helpful  to  have  a  notation  that  ignores  it.  We  use 
&{f{n))  to  designate  any  function  that  behaves  like  a  constant  times  /(n)  for  arbitrarily  large  n. 
Thus  we  would  say  that  the  "If"  takes  time  ©(n)  and  the  setup  and  return  takes  time  6(1). 
Thus  the  total  time  is  6(n)  -|-  6(1).  Since  n  is  much  bigger  than  1  for  large  n,  the  total  time  is 
©(n).  □ 

We  need  to  define  ©  more  precisely  and  list  its  most  important  properties.  We  will  also  find  it 
useful  to  define  O,  read  "big  oh." 

Definition  B.l  Notation  for  0  and  O  Let  f  and  g  be  functions  from  the  positive  integers 
to  the  real  numbers. 

•  We  say  that  g{n)  is  0{f{n))  if  there  exists  a  positive  constant  B  such  that  \g{n)\  <  B\f{n)\ 
for  all  sufEciently  large  n.  In  this  case  we  say  that  g  grows  no  faster  than  f  or,  equivalently, 
that  f  grows  at  least  as  fast  as  g. 

•  We  say  that  g{n)  is  0^{f{n))  if  g{n)  is  0{f{n))  and  g{n)  >  0  for  sufEciently  large 
n. 

•  We  say  that  g{n)  is  ©(/(n))  if  (i)  g{n)  is  0{f{n))  and  (ii)  f{n)  is  0{g{n)).  In  this  case  we 
say  that  f  and  g  grow  at  the  same  rate. 

•  We  say  that  g{n)  is  ©+(/(n))  if  g{n)  is  Q{f{n))  and  g{n)  >  0  for  sufficiently  large 
n. 
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Remarks  l.  The  phrase  ''S{n)  is  true  for  all  sufficiently  large  n"  means  that  there  is  some  integer 
N  such  that  S{n)  is  true  whenever  n>  N. 

2.  Saying  that  something  is  Q~^{f{n))  gives  an  idea  of  how  big  it  is  for  large  values  of  n.  Saying  that 
something  is  0^{f{n))  gives  an  idea  of  an  upper  hound  on  how  big  it  is  for  all  large  values  of  n.  (We 
said  "idea  of"  because  we  don't  know  what  the  constants  are.) 

3.  The  notation  0^  and  G+  is  not  standard.  We  have  introduced  it  because  it  is  convenient  when 
combining  functions. 

Theorem  B.l   Some  properties  of  ©  and  O   We  have 

(a)  g{n)  is  G(/(n))  if  and  only  if  there  are  positive  constants  A  and  B  such  that 
^\f{'>^)\  <  <  ^or  all  sufEciently  large  n. 

(b)  Ifg{n)  is  e+(/(n)),  then  g{n)  is  e(/(n)). 
Ifg{n)  is  0+(/(n)),  then  g{n)  is  0{f{n)). 
Ifg{n)  is  e(/(n)),  then  g{n)  is  0{f{n)). 

(c)  /(n)  is  e{f{n))  and  f{n)  is  0{f{n)). 

(d)  If  g{n)  is  Q{f{n))  and  C  and  D  are  nonzero  constants,  then  Cg{n)  is  Q{Df{n)). 
If  g{n)  is  0{f{n))  and  C  and  D  are  nonzero  constants,  then  Cg{n)  is  0{Df{n)). 

(e)  If  gin)  is  Q{f{n)),  then  f{n)  is  e{g{n)). 

(f)  Ifg{n)  is  eifin))  and  f{n)  is  @{h{n)),  then  g{n)  is  Q{h{n)). 
Ifg{n)  is  0{f{n))  and  f{n)  is  0{h{n)),  then  g{n)  is  0{h{n)). 

{g)      9i{n)  is  e(/i(n))  and  g2{n)  is  e(/2(n)),  then  gi{n)g2{n)  is  &{fi{n)f2{n))  and, 
if  in  addition  /2  and  g2  are  never  zero,  then  gi{n)/g2{n)  is  9(/i(n)//2(n)). 
Ifgi{n)  is  0(/i(n))  and  g2{n)  is  0(/2(n)),  then  gi{n)g2{n)  is  0(/i(n)/2(n)). 

(h)  Ifgi{n)  ise+ifiin))  andg2{n)  isQ+{f2{n)),  then  gi{n)+g2{n)  is  e+(max(|/i(n)|,  |/2(n)|)). 
Ifgi{n)  is  0(/i(n))  and  g2{n)  is  0(/2(n)),  then  gi{n)  +  g2{n)  is  0(max(|/i(n)|,  |/2(n)|)). 

(i)  Ifg{n)  is  e+(/(n))  and  h{n)  is  0+{f{n)),  then  g{n)  +  h{n)  is  Q+{f{n)). 

Note  that  by  Theorem  5.1  (p.  127)  and  properties  (c),  (c)  and  (f)  above,  the  statement  "g{n) 
is  G(/(n))"  defines  an  equivalence  relation  on  the  set  of  functions  from  the  positive  integers  to 
the  reals.  Similarly,  "g{n)  is  0+(/(n))"  defines  an  equivalence  relation  on  the  set  of  functions 
which  are  positive  for  sufEciently  large  n. 
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Proof:  If  you  completely  understand  the  definitions  of  O,  ,  Q,  and  0+,  many  parts  of  the  the- 
orem will  be  obvious  to  you.  None  of  the  parts  is  difficult  to  prove  and  so  we  leave  most  of  the  proofs 
as  an  exercise.  To  help  you  out,  we'll  do  a  couple  of  proofs. 

We'll  do  (f)  for  G.  By  (a),  there  are  nonzero  constants  Ai  and  Bi  such  that 

Ai|/(n)|  <  \g{n)\  <  B,\f{n)\ 

and 

A2\h{n)\  <  |/(n)|  <  B2\h{n)\ 

for  all  sufficiently  large  n.  It  follows  that 

AiA2\h{n)\  <  A,\f{n)\  <  \g{n)\  <  Si|/(n)|  <  BiB2\h{n)\ 

for  all  sufficiently  large  n.  With  A  =  A1A2  and  B  =  B1B2,  it  follows  that  g{n)  is  Q{h{n)). 
We'll  do  (h)  for  0+.  By  (a),  there  are  nonzero  constants  Ai  and  Bi  such  that 

^i|/i(n)|  <  9i{n)  <  B,\h{n)\    and    Aaj/aHl  <  92(71)  <  B2|/2(n)| 

for  sufficiently  large  n.  Adding  gives 

Ai|/i(n)|+^2|/2(n)|  <  gi{n)+g2{n)  <  Bi|/i(n)|  +  B2|/2(n)|  B.l 

We  now  do  two  things.  First,  let  A  =  min(Ai,  A2)  and  note  that 

^i|/i(n)|+^2|/2(n)|  >  ^(|/i(n)|  +  |/2(n)|)  >  Amax(|/i(n)|,  |/2(n)|). 

Second,  let  B  =  2max(Bi,  B2)  and  note  that 

Bmn)\  +  \f2{n)\) 


Bi\h{n)\+B2\f2{n)\  < 


2 


^  g2max(|/i(n)|,|/2(n)|)  n^^Mlf^M^ 
<   ^   =  -Bmax(|/i(n)|,|/2(n)|). 

Using  these  two  results  in  (B.l)  gives 

Amax(|/i(n)|,|/2(n)|)   <  gi{n)  +  92(11)  <  B max(|/i(n)|,  |/2(n)|) , 

which  proves  that  gi{n)  +  g2{n)  is  6+  max(|/i(n)|,  |/2(«)|)-  D 

Example  B.2    Using  the  notation    To  illustrate  these  ideas,  we'll  consider  three  algorithms  for 

evaluating  a  polynomial  p{x)  of  degree  n  at  some  point  r;  i.e.,  computing  po  +  pir  +  ■  ■  ■  +pnr"'.  We 
are  interested  in  how  fast  they  are  when  n  is  large.  Here  are  the  procedures.  You  should  convince 
yourself  that  they  work. 

Polyl (n,p,  r) 
S  =po 

For  i  =  l,...,n      S  =  S  +  Pi*Pov(,r,i) . 
Return  S 

End 

Pow(r,  i) 

P=l 

For  j  =  1, . . .  ,i      P  =  P  *r. 
Return  P 

End 


Appendix  B  371 


Poly2(n,p,  r) 

S  =  Po 
P  =  l 

For  i  =  1, . . .  ,n 

P  =  P*r 

S  =  S  +  pi*P 
End  for 
Return  S 

End 

Poly3 (n,p,r) 

S  =Pn 

/*  Here  i  decreases  from  n  to  1 .  */ 
For  i  =  n, . . .  ,2,1       S  =  S  *r  +  pi-i 
Return  S 

End 

Let  r„(Mcime)  be  the  time  required  for  the  procedure  Name.  Let's  analyze  Polyl.  The  "For" 
loop  in  Pow  is  executed  i  times  and  so  takes  Ci  operations  for  some  constant  C  >  0.  The  setup 
and  return  in  Pow  takes  some  constant  number  of  operations  D  >  0.  Thus  r„(Pow)  =  Ci  +  D 
operations.  As  a  result,  the  ith  iteration  of  the  "For"  loop  in  Polyl  takes  Ci  +  E  opera- 
tions for  some  constants  C  and  E  >  D.  Adding  this  over  i  =  l,2,...,n,  we  see  that  the 
total  time  spent  in  the  "For"  loop  is  0+(n^)  since  X]r=i  *  =         +  (You  should  write 

out  the  details.)  Since  the  rest  of  Polyl  takes  time,  T„(Polyl)  is  6"'"(n^)  by  Theo- 

rem B.l(h). 

The  amount  of  time  spent  in  the  "For"  loop  of  Poly2  is  constant  and  the  loop  is  executed  n 
times.  It  follows  that  T„(Poly2)  is  ©+(n).  The  same  analysis  applies  to  Poly3. 

What  can  we  conclude  from  this  about  the  comparative  speed  of  the  algorithms?  By  Theo- 
rem B.l(a),  0"*",  there  are  positive  reals  A  and  B  so  that  An^  <  T„(Polyl)  and  T„(Poly2)  <  Bn 
for  sufficiently  large  n.  Thus  T„(Poly2)/T„ (Polyl)  <  B/An.  As  n  gets  larger,  Poly2  looks  better 
and  better  compared  to  Polyl. 

Unfortunately,  the  crudeness  of  O  docs  not  allow  us  to  make  any  distinction  between  Poly2  and 
PolyS.  What  we  can  say  is  that  T„(Poly2)  is  e+(T„(Poly3));  i.e.,  T„(Poly2)  and  r„(Poly3)  grow 
at  the  same  rate.  A  more  refined  estimate  can  be  obtained  by  counting  the  actual  number  of  oper- 
ations involved.  You  should  compare  the  number  of  multiplications  required  and  thereby  obtain  a 
more  refined  estimate.  Q 

So  far  we've  talked  about  how  long  an  algorithm  takes  to  run  as  if  this  were  a  simple, 
clear  concept.  In  the  next  example  we'll  see  that  there's  an  important  point  that  we've  ig- 
nored. 
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Example  B.3  What  is  average  running  time?  Let's  consider  the  problem  of  (a)  deciding 
whether  or  not  a  simple  graph^  can  be  properly  colored  with  four  colors  and,  (b)  if  a  proper  color- 
ing exists,  producing  one.  (A  proper  coloring  is  a  function  A:  F  — >  C,  the  set  of  colors,  such  that,  if 
{u,  v}  is  an  edge,  then  X{u)  ^  A(t;).)  We  may  as  well  assume  that  V  =  n  and  that  the  colors  are  ci, 
C2,  C3  and  C4. 

Here's  a  simple  algorithm  to  determine  a  A  by  using  backtracking  to  go  lexicographically  through 
possible  colorings  A(l),  A(2), . . . ,  A(n). 

1.  Initialize:  Set  v  =  1  and  A(l)  =  ci. 

2.  Advance  in  decision  tree:    If  w  =  n,  stop  with  A  determined;  otherwise,  set  v  =  v  +  1  and 

X{v)  =  ci. 

3.  Test:  If  A(i)  ^  A(w)  for  alH  <  w  for  which  {i,  v}  £  E,  go  to  Step  2. 

4.  Select  next  decision:  Let  j  be  such  that  X{v)  =  Cj.  If  j  <  4,  set  X{v)  =  cj+i  and  go  to  Step 
3. 

5.  Backtrack:   If  v  =  1,  stop  with  coloring  impossible;  otherwise,  set  v  =  v  —  1  and  go  to  Step 
4. 

How  fast  is  this  algorithm?  Obviously  it  will  depend  on  the  graph.  For  example,  if  the  sub- 
graph induced  by  the  first  five  vertices  is  the  complete  graph  (i.e.,  all  of  the  ten  possible 
edges  are  present),  then  the  algorithm  stops  after  trying  to  color  the  first  five  vertices  and  dis- 
covering that  there  is  no  proper  coloring.  If  the  last  five  vertices  induce  and  the  remain- 
ing n  —  5  vertices  have  no  edges,  then  the  algorithm  makes  9+ (4")  assignments  of  the  form 
X{v)  =  Ck- 

It's  reasonable  to  talk  about  the  average  time  the  algorithm  takes  if  we  expect  to  give  it  lots 
of  graphs  to  look  at.  Most  n  vertex  graphs  will  have  many  sets  of  five  vertices  that  induce  K5.  (We 
won't  prove  this.)  Thus,  we  should  expect  that  the  algorithm  will  stop  quickly  on  most  graphs.  In 
fact,  it  can  be  proved  that  the  average  number  of  assignments  of  the  form  X{v)  =  Ck  that  are  made 
is  9+(l).  This  means  that  the  average  running  time  of  the  algorithm  is  bounded  for  all  n,  which  is 
quite  good! 

Now  suppose  you  give  this  algorithm  to  a  friend,  telling  him  that  on  average  the  run- 
ning time  is  practically  independent  of  the  number  of  vertices.  He  thanks  you  profusely  for 
such  a  wonderful  algorithm  and  puts  it  to  work  coloring  randomly  generated  "planar"  graphs. 
By  the  Four  Color  Theorem,  every  planar  graph  can  be  properly  colored  with  four  colors,  so 
the  algorithm  must  make  at  least  n  assignments  of  the  form  X{v)  =  Ck-  (Actually  it  will  al- 
most surely  make  many,  many  more.)  Your  friend  soon  comes  back  to  you  complaining  bit- 
terly. 

What  went  wrong?  In  our  previous  discussion  we  were  averaging  over  all  simple  graphs  with 
n  vertices.  Your  friend  was  interested  in  the  average  over  all  simple  planar  graphs  with  n  vertices. 
These  averages  are  very  different!  There  is  a  moral  here: 

You  must  be  VERY  clear  what  you  are  averaging  over. 

Because  situations  like  this  do  occur  in  real  life,  computer  scientists  are  careful  to  specify  what 
kind  of  running  time  they  are  talking  about;  either  the  average  of  the  running  time  over  some 
reasonable,  clearly  specified  set  of  problems  or  the  worst  (longest)  running  time  over  all  possibili- 
ties. D 


A  "simple  graph"  is  a  set  V,  called  vertices,  and  a  set  E  of  two  element  subsets  of  V,  called 
edges.  One  thinks  of  an  edge  {u,  v}  as  joining  the  two  vertices  u  and  v. 
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Example  B.4  Which  algorithm  is  faster?  Suppose  we  have  two  algorithms  for  a  problem, 
one  of  which  is  0+(n^)  and  the  other  of  which  is  0+(n^lnlnn).  Which  is  better?^  It  would 
seem  that  the  first  algorithm  is  better  since  grows  slower  than  In  Inn.  That's  true  if  n 
is  large  enough.  How  large  is  large  enough?  In  other  words,  what  is  the  crossover  point,  the 
time  when  we  should  switch  from  one  algorithm  to  the  other  in  the  interests  of  speed?  To  de- 
cide, we  need  to  know  more  than  just  0+(  )  because  that  notation  omits  constants.  Suppose 
one  algorithm  has  running  time  close  to  3n^,  and  the  other,  close  to  In  Inn.  The  second  al- 
gorithm is  better  as  long  as  3  >  In  Inn.  Exponentiating  twice,  wc  get  n  <  ef^ ,  which  is  about 
5  X  10*.  This  is  a  large  crossover  point!  On  the  other  hand,  suppose  the  first  algorithm's  running 
time  is  close  to  n^.  In  that  case,  the  second  algorithm  is  better  as  long  as  1  >  In  Inn,  that  is, 
n  <  e^,  which  is  about  15.  A  slight  change  in  the  constant  caused  a  huge  change  in  the  crossover 
point! 

If  slight  changes  in  constants  matter  this  much  in  locating  crossover  points,  what  good  is 
9+(  )  notation?  We've  misled  you!  The  crossover  points  are  not  that  important.  What  matters 
is  how  much  faster  one  algorithm  is  than  another.  If  one  algorithm  has  running  time  An?  and 
the  other  has  running  time  i?n^  In  Inn,  the  ratio  of  their  speeds  is  {B/A)lnlnn.  This  is  fairly 
close  to  B/A  for  a  large  range  of  n  because  the  function  In  Inn  grows  so  slowly.  In  other  words, 
the  running  time  of  the  two  algorithms  differs  by  nearly  a  constant  factor  for  practical  values  of 
n. 

We're  still  faced  with  a  problem  when  deciding  between  two  algorithms  since  we  don't  know 
the  constants.  Suppose  both  algorithms  are  0+(n^).  Which  do  we  choose?  If  you  want  to  be  cer- 
tain you  have  the  faster  algorithm,  you'll  either  have  to  do  some  very  careful  analysis  of  the  code  or 
run  the  algorithms  and  time  them.  However,  there  is  a  rule  of  thumb  that  works  pretty  well:  More 
complicated  data  structures  lead  to  larger  constants. 

Let's  summarize  what  we've  learned  in  the  last  two  paragraphs.  Suppose  we  want  to  choose 
the  faster  of  Algorithm  A  whose  running  time  is  Q~^{a{n))  and  Algorithm  B  whose  running  time  is 
e+(6(n)).) 

•  If  possible,  simplify  a(n)  and  b{n)  and  ignore  all  slowly  growing  functions  of  n  such  as  In  Inn. 
("What  about  Inn?"  you  ask.  That's  a  borderline  situation.  It's  usually  better  to  keep  it.)  This 

gives  two  new  functions  a*(n)  and  b*{n). 

•  If  a*(n)  =  6+(6*(n)),  choose  the  algorithm  with  the  simpler  data  structures;  otherwise,  choose 
the  algorithm  with  the  smaller  function. 

These  rules  are  far  from  foolproof,  but  they  provide  some  guidance.  Q 

Wc  now  define  two  more  notations  that  arc  useful  in  discussing  rate  of  growth.  The  notation  o 
is  read  "little  oh".  There  is  no  standard  convention  for  reading  ~. 

Definition  B.2  Notation  for  o  and  ~  Let  f,  g  and  h  be  functions  from  the  positive 
integers  to  the  real  numbers. 

•  If  lim„^oo  oi'n)/ f{i^)  =  1)  we  say  that  g{n)  ^  f{n).  In  this  case,  we  say  that  f  and  g  are 
asymptotically  equal. 

•  If  lim„^oo  h{n)/f{n)  =  0,  we  say  that  hf(n)  is  o{g{n)).  In  this  case,  we  say  that  h  grows 
slower  than  f  or,  equivalently,  that  f  grows  faster  than  h. 

^  This  situation  actually  occurs,  see  the  discussion  at  the  end  of  Example  6.3  (p.  152). 
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Asymptotic  equality  has  many  of  the  properties  of  equahty.  The  two  main  exceptions  are  cancellation 
and  exponentiation: 

•  You  should  verify  that      +  1  ~      +  n  and  cancelling  the      from  both  sides  is  bad  because 
1  ~  n  is  false. 

•  You  should  verify  that  exponentiation  is  bad  by  showing  that  e"  +1  ~  e"  +"  is  false.  In  fact, 
you  should  verify  that  e"^+^  is  o(e"^+"). 

In  the  following  theorem,  we  see  that  asymptotic  equality  defines  an  equivalence  relation  (d),  al- 
lows multiplication  and  division  (c,e),  and  allows  addition  of  functions  with  the  same  sign  (f).  You 
should  verify  that  Theorem  B.l  says  the  same  thing  for  0(  ). 

Theorem  B.2   Some  properties  of  o  and  ~   We  have 

(a)  Ifg{n)  is  o(/(n)),  then  g{n)  is  0{f{n)). 
Iffin)  ~  g{n),  then  f{n)  is  Q{g{n)). 

(b)  /(n)  is  not  o(/(n)). 

(c)  If  g{n)  is  o{f{n))  and  C  and  D  arc  nonzero  constants,  then  Cg{n)  is  o{Df{n)). 
If  g{n)     f{n)  and  C  is  a  nonzero  constant,  then  Cg{n)  ~  C f{n). 

(d)  "g{n)  ^  f{n)"  defines  and  equivalence  relation. 

(e)  If  gi{n)  is  o(/i(n))  and  g2{n)  is  o(/2(n)),  then  gi{n)g2{n)  is  o(/i(n)/2(n)). 
If  gi{n)  ~  /i(n)  and  g2{n)  ~  /2(n),  then  gi{n)g2{n)  ~  /i(n)/2(n) 

and  gi{n)/g^{n)  ~  /i(n)//2(n). 

(f)  If  gi{n)  is  o(/i(n))  and  g2{n)  is  o(/2(n)),  then  gi{n)  +  g2(n)  is  o(max(/i(n),  /2(n))). 

If  gi  {n)  ~  /i  (n)  and  g2  (n)  ~  /2  (n)  and  /i  (n)  and  gi  (n)  are  nonnegative  for  all  sufEciently 
large  n,  then  gi{n)  +  32 (n)  ~  /i(n)  +  /2(n). 

(g)  If  h{n)  is  o{f{n))  and  g{n)  ~  f{n),  then  g{n)  +  h{n)  ~  f{n). 

If  h{n)  is  o{f{n))  and  g{n)  is  Q(f{n)),  then  g{n)  +  h(n)  is  Q{f{n)). 
If  h{n)  is  o{f{n))  and  g{n)  is  0{f{n)),  then  g{n)  +  h{n)  is  0{f{n)). 

(Ji)  If  gi{n)  is  o(/i(n))  and  g2{n)  is  0(/2(n)),  then  gi{n)g2{n)  is  o(/i(n)/2(n)). 
The  proof  is  left  as  an  exercise.  We'll  see  some  applications  in  the  next  section. 

Exercises 


B.1.1.  Prove  the  properties  of  e(),  e+( ),  0(  ),  and  0+( )  given  in  Theorem  B.l. 

B.l. 2.  Prove  by  example  that  (g)  in  Theorem  B.l  does  not  hold  for  0. 

B.l. 3.  Prove  or  disprove  the  statement: 

"g{n)  is  0(/(n))"  defines  an  equivalence  relation  for  functions  from  the  positive  integers  to  the 
nonnegative  reals  (as  did  the  corresponding  statement  for  G). 

B.l. 4.  In  each  case,  prove  that  f{x)  is  Q{g{x))  using  the  definition  of  Q{  ). 

(a)  f{x)  =      +  5x^  +  10,  g{x)  =  20x^ . 

(b)  fix)  =      +  5x-2  +  10,  g{x)  =  200x'^ . 
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B.1.5.  In  each  case,  show  that  the  given  series  has  the  indicated  property. 

(a)  ELi^'  ise(n3). 

(b)  Er=i«'i« 

(c)  Er=i^'/'  ise(n3/2). 
B.1.6.  Show  each  of  the  following 

(a)  Er=i  Q(log6(^))  for  any  base  6  >  1. 

(b)  log6(n!)  is  0(nlog(,(n))  for  any  base  6  >  1 

B.1.7.  We  have  three  algorithms  for  solving  a  problem  for  graphs.  Suppose  algorithm  A  takes  millisec- 
onds to  run  on  a  graph  with  n  vertices,  algorithm  B  takes  lOOn  milliseconds  and  algorithm  C  takes 
100(2"/^°  -  1)  milhseconds. 

(a)  Compute  the  running  times  for  the  three  algorithms  with  n  =  5,  10,  30,  100  and  300.  Which 
algorithm  is  fastest  in  each  case?  slowest? 

(b)  Which  algorithm  is  fastest  for  all  very  large  values  of  n?  Which  is  slowest? 
B.1.8.  Prove  Theorem  B.2. 

B.1.9.  Let  p{x)  be  a  polynomial  of  degree  k  with  positive  leading  coefficient  and  suppose  that  o  >  1.  Prove 
the  following. 

(a)  e(p(n))  is  e(n'=). 

(b)  0{p{n))  is  0(71'=). 

(c)  p{n)  —  o(a").  (Also,  what  does  this  say  about  the  speed  of  a  polynomial  time  algorithm  versus 
one  which  takes  exponential  time?) 

(d)  0(oP("))  is  0(o<^"'°)  for  some  C  >  0. 

(e)  Unless  p{x)  =  pix^  +P2  for  some  pi  and  P2,  there  is  no  C  such  that 
B.1.10.  Suppose  1  <  a  <  6  and  f{n)      +00  as  n  ^  00.  Prove  that 

a-^(")  =  o(6^("))       and       o'?^")  =  o(a-^(")+9(")), 
for  all  functions  g{n). 

B.1.11.  Consider  the  following  algorithm  for  determining  if  the  distinct  integers  01, 02,  ■  ■  ■  ,an  are  in  increas- 
ing order. 

For  i  =  2,  .  .  .  ,n 

If  Oj_i>Oj,  return  ''out  of  order.'' 
End  for 

Return  ' ' in  order . ' ' 

(a)  Discuss  worst  case  running  time. 

(b)  Discuss  average  running  time  for  all  permutations  of  n. 

(c)  Discuss  average  running  time  for  all  permutations  of  2n  such  that  01  <  02  <•••<  an- 
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Doing  Arithmetic 


If  we  try  to  use  Theorems  B.l  and  B.2  in  a  series  of  calculations,  the  lack  of  arithmetic  notation 

becomes  awkward.  You're  already  familiar  with  the  usefulness  of  notation;  for  example,  when  one 
understands  the  notation,  it  is  easier  to  understand  and  verify  the  statement 

{ax  -  hf  =  a^x^  -  2abx  +  6^ 

than  it  is  to  understand  and  verify  the  statement 

The  square  of  the  difference  between  the  product  of  a  and  x  and  b  equals  the  square  of  a 
times  the  square  of  x  decreased  by  twice  the  product  of  a,  b  and  x  and  increased  by  the 
square  of  b. 

If  we  simply  interpret  "is"  in  the  theorems  as  an  equality,  we  run  into  problems.  For  ex- 
ample, we  would  then  say  that  since  n  =  0{n)  and  n  +  1  =  0(n),  we  would  have  n  = 
n  +  1.  How  can  we  introduce  arithmetic  notation  and  avoid  such  problems?  The  key  is  to  re- 
define 0,  O  and  o  slightly  using  sets.  Let  0*,  O*  and  o*  be  our  old  definitions.  Our  new  ones 
are: 

.  e{f{n))  =  {g{n)\g{n)  is  e*(/(n))}, 

•  9+(/(n))  —  {g{n)  \  g{n)  is  Q*{f{n))  and  g{n)  is  positive  for  large  n}, 

•  0{fin))  =  {g{n)\g{n)  is  0*{f{n))}, 

•  0^{f(n))  =  {g{n)  \  g{n)  is  0*{f{n))  and  g{n)  is  positive  for  large  n}, 
.  o(/(n))  =  {g{n)  \  g{n)  is  o*(/(n))}. 

If  we  replace  "is"  with  "is  in".  Theorems  B.l  and  B.2  are  still  correct.  For  example,  the  last  part 
Theorem  B.l(b)  becomes 

I{g{n)  e  0(/(n)),  then  g{n)  G  0{f{n)). 

We  want  to  make  two  other  changes: 

•  Replace  functions,  numbers,  and  so  on  with  sets  so  that  we  use  C  instead  of  €.  For  example, 
instead  of  5n  €  0(n),  we  say  {5rt}  C  0{n). 

•  An  arithmetic  operation  between  sets  is  done  element  by  element;  for  example, 

A  +  B  =  {a  +  b\a&Aaa.db&B}. 

Let's  rewrite  parts  of  Theorem  B.l  using  this  notation: 

(b)  If  {g{n)}  C  0+(/(n)),  then  {g{n)]  C  e(/(n)). 
If  {g{n)}  C  0+(/(n)),  then  {g{n)}  C  0(/(n)). 
If  {g{n)}  C  e(/(n)),  then  {g{n)}  C  0(f{n)). 

(f)  If  {g{n)}  C  0(/(n))  and  {/(n)}  C  Q{h{n)),  then  {g{n)}  C  Q{h{n)). 
If  {g{n))  C  and  {/(n)}  C  0{h{n)),  then  {g{n)]  C  0{h{n)). 

(i)  If  {g{n)]  C  0+(/(n))  and  {h{n)}  C  0+(/(n)),  then  {g{n)  +  h{n)]  C  0+(/(n)). 
We  leave  it  to  you  to  translate  other  parts  and  to  translate  Theorem  B.2. 

In  practice,  people  simplify  the  notation  we've  introduced  by  replacing  things  like  {/(n)}  with 

/(n),  which  is  good  since  it  makes  these  easier  to  read.  They  also  replace  C  with  =,  which  is  dan- 
gerous but  is,  unfortunately,  the  standard  convention.  We'll  abide  by  these  conventions,  but  will 
remind  you  of  what  we're  doing  by  footnotes  in  the  text. 
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Example  B.5  Using  the  notation  The  statement  f{n)  ~  g{n)  is  equivalent  to  the  statement 
/(n)  =  g{n){l  +  o(l))  and  also  to  /(n)  =  g{n)  +  o{g{n)).  The  first  is  because  f{n)/g{n)  ^  1  if  and 
only  if  f{n)/g{n)  =  1  +  o(l).  The  second  follows  from 

g{n){l+o{l))  =  g{n)+g{n)o{l)  =  g{n)+o{g{n))  and  g{n)+o{g{n))  =  g{n)+g{n)o{l)  =  g{n){l+o{l)). 
Why  do  we  need  the  second  of  these  statements? 

*  *        *       Stop  and  think  about  this!        *        *  * 

Remember  that  =  really  means  C,  so  the  first  statement  shows  that  g{n){l  +  o(l))  C  g{n)  +  o{g{n)) 
and  the  second  shows  that  g{n)  =  o{g{n))  C  g{n){l  +  o(l)).  Taken  together,  the  two  statements 
show  that  the  sets  g{'n){l  +  o(l))  and  g{n)  +  o{g{n))  are  equal  and  so  /(n)  is  in  one  if  and  only  if 
it  is  in  the  other. 

We  can  include  functions  of  sets:  Suppose  5*  is  a  subset  of  the  domain  of  the  function  F,  define 
F{S)  =  {/(s)  I  s  e  5}.  With  this  notation, 

e°(i)  =  l  +  o(l)  =  e°(i)       and       e"^^^^  =  0+(l); 

however,  C'+(l)  ^  e°(i).  Why  is  this? 

*  *        *       Stop  and  think  about  this!        *        *  * 
We  have  e""  e  0+(l)  but,  since  n  ^  0(1),  e""  ^  e'^(i)  □ 

Everything  we've  done  so  far  is  with  functions  from  the  positive  integers  to  the  reals  and 

we've  asked  what  happens  as  n  ^  oo.  Wc  can  have  functions  on  other  sets  and  ask  what  hap- 
pens when  we  take  a  different  limit.  For  example,  the  definition  of  a  derivative  can  be  written 
as 

fi^^^l'f^^^  .  /'(.)  as/.^0, 
provided  f'{x)  ^  0.  Taylor's  theorem  with  remainder  can  be  written 

fe=0 

provided  /'■""''"'^■'(a;)  is  well  behaved  near  x  =  0.  Of  course,  this  is  not  as  good  as  the  form  in  your 
calculus  book  because  it  says  nothing  about  how  big  the  error  term,  0{x"^^)  is  for  a  particular 
function  f{x). 


B.3   NP-Complete  Problems 


Computer  scientists  talk  about  "polynomial  time  algorithms."  What  does  this  mean?  Suppose  that 
the  algorithm  can  handle  arbitrarily  large  problems  and  that  it  takes  Q{n)  seconds  on  a  problem  of 
"size"  n.  Then  we  call  it  a  linear  time  algorithm.  More  generally,  if  there  is  a  (possibly  quite  large) 
integer  k  such  that  the  worst  case  running  time  on  a  problem  of  "size"  n  is  0{n''),  then  we  say  the 
algorithm  is  polynomial  time. 

You  may  have  noticed  the  quotes  around  size  and  wondered  why.  It  is  necessary  to  specify 
what  we  mean  by  the  size  of  a  problem.  Size  is  often  interpreted  as  the  number  of  bits  required 
to  specify  the  problem  in  binary  form.  You  may  object  that  this  is  imprecise  since  a  problem  can 
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be  specified  in  many  ways.  This  is  true;  liowever,  tlie  number  of  bits  in  one  "reasonable"  repre- 
sentation doesn't  differ  too  much  from  the  number  of  bits  in  another.  We  won't  pursue  this  fur- 
ther. 

If  the  worst  case  time  for  an  algorithm  is  polynomial,  theoretical  computer  scientists  think 
of  this  as  a  good  algorithm.  (This  is  because  polynomials  grow  relatively  slowly;  for  example,  ex- 
ponential functions  grow  much  faster.)  The  problem  that  the  algorithm  solves  is  called  tractable. 

Do  there  exist  intractable  problems;  i.e.,  problems  for  which  no  polynomial  time  algorithm  can 
ever  be  found?  Yes,  but  we  won't  study  them  here.  More  interesting  is  the  fact  that  there  are  a  large 
number  of  practical  problems  for  which 

•  no  polynomial  time  algorithm  is  known  and 

•  no  one  has  been  able  prove  that  the  problems  are  intractable. 

We'll  discuss  this  a  bit.  Consider  the  following  problems. 

•  Coloring  Problem:  For  any  c  >  2,  devise  an  algorithm  whose  input  can  be  any  simple 
graph  and  whose  output  answers  the  question  "Can  the  graph  be  properly  colored  in  c  col- 
ors?" 

•  Traveling  Salesman  Problem:  For  any  B,  devise  an  algorithm  whose  input  can  be  any  n  >  0  and 
any  edge  labeling  X:V2{n)  — >  R  for  Kn,  the  complete  graph  on  n  vertices.  The  algorithm  must 
answer  the  question  "Is  there  a  cycle  through  all  n  vertices  with  cost  B  or  less?"  (The  cost  of  a 
cycle  is  the  sum  of  A(e)  over  all  e  in  the  cycle.) 

•  Language  Recognition  Problem:  Devise  an  algorithm  whose  input  is  two  finite  sets  S  and  T  and 
an  integer  k.  The  elements  of  S  and  T  are  finite  strings  of  zeroes  and  ones.  The  algorithm  must 
answer  the  question  "Does  there  exist  a  finite  automaton  with  k  states  that  accepts  all  strings 
in  S  and  accepts  none  of  the  strings  in  T?" 

No  one  knows  if  these  problems  are  tractable,  but  it  is  known  that,  if  one  is  tractable,  then  they 
all  are.  There  are  hundreds  more  problems  that  people  are  interested  in  which  belong  to  this  par- 
ticular list  in  which  all  or  none  are  tractable.  These  problems  are  called  NP- complete  Many  people 
regard  deciding  if  the  NP-complete  problems  are  tractable  to  be  one  of  the  foremost  open  problems 
in  theoretical  computer  science. 

The  NP-complete  problems  have  an  interesting  property  which  we  now  discuss.  If  the  algo- 
rithm says  "yes,"  then  there  must  be  a  specific  example  that  shows  why  this  is  so  (an  assign- 
ment of  colors,  a  cycle,  an  automaton).  There  is  no  requirement  that  the  algorithm  actually  pro- 
duce siich  an  example.  Suppose  we  somehow  obtain  a  coloring,  a  cycle  or  an  automaton  which  is 
claimed  to  be  such  an  example.  Part  of  the  definition  of  NP-complete  requires  that  we  be  able 
to  check  the  claim  in  polynomial  time.  Thus  we  can  check  a  purported  example  quickly  but,  so 
far  as  is  known,  it  may  take  a  long  time  to  determine  if  such  an  example  exists.  In  other  words, 
I  can  check  your  guesses  quickly  but  I  don't  know  how  to  tell  you  quickly  if  any  examples  ex- 
ist. 

There  are  problems  like  the  NP-complete  problems  where  no  one  knows  how  to  do  any  check- 
ing in  polynomial  time.  For  example,  modify  the  traveling  salesman  problem  to  ask  for  the  min- 
imum cost  cycle.  No  one  knows  how  to  verify  in  polynomial  time  that  a  given  cycle  is  actu- 
ally the  minimum  cost  cycle.  If  the  modified  traveling  salesman  problem  is  tractable,  so  is  the 
one  we  presented  above:  You  need  only  find  the  minimum  cost  cycle  and  compare  its  cost  to 
B.  Such  problems  are  called  NP-hard  because  they  are  at  least  as  hard  as  NP-complete  prob- 
lems. A  problem  which  is  tractable  if  the  NP-complete  problems  are  tractable  is  called  NP-easy. 
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Some  problems  are  both  NP-easy  and  NP-hard  but  may  not  be  NP-completc.  Why  is  this? 
NP-complete  problems  must  ask  a  "yes/no"  type  of  question  and  it  must  be  possible  to  check  a 
specific  example  in  polynomial  time  as  noted  in  the  previous  paragraph. 

What  can  we  do  if  we  cannot  find  a  good  algorithm  for  a  problem?  There  are  three  main  types 
of  partial  algorithms: 

1.  Almost  good:  It  is  polynomial  time  for  all  but  a  very  small  subset  of  possible  problems.  (If  we 
are  interested  in  all  graphs,  our  coloring  algorithm  in  Example  B.3  is  almost  good  for  any  fixed 
c.) 

2.  Almost  correct:  It  is  polynomial  time  but  in  some  rare  cases  does  not  find  the  correct  answer. 
(If  we  are  interested  in  all  graphs  and  a  fixed  c,  automatically  reporting  that  a  large  graph  can't 
be  colored  with  c  colors  is  almost  correct — ^but  it  is  rather  useless.)  In  some  situations,  a  fast 
almost  correct  algorithm  can  be  useful. 

3.  Close:  It  is  a  polynomial  time  algorithm  for  a  minimization  problem  and  comes  close  to  the 
true  minimum.  (There  are  useful  close  algorithms  for  approximating  the  minimum  cycle  in  the 
Traveling  Salesman  Problem.) 

Some  of  the  algorithms  make  use  of  random  number  generators  in  interesting  ways.  Unfortunately, 
further  discussion  of  these  problems  is  beyond  the  scope  of  this  text. 

Exercises 


B.3.1.  The  chromatic  number  x(G)  of  a  graph  G  is  the  least  number  of  colors  needed  to  properly  color  G. 
Using  the  fact  that  the  problem  of  deciding  whether  a  graph  can  be  properly  colored  with  c  colors 

is  NP-complete,  prove  the  following. 

(a)  The  problem  of  determining  x(G')  is  NP-hard. 

(b)  The  problem  of  determining  x{G)  is  NP-easy. 

Hint.  You  can  color  G  with  c  colors  if  and  only  if  c  >  x{G)- 

B.3. 2.  The  bin  packing  problem  can  be  described  as  follows.  Given  a  set  S  of  positive  integers  and  integers 
B  and  K,  is  there  a  partition  of  S  into  K  blocks  so  that  the  sum  of  the  integers  in  each  block  does 
not  exceed  B?  This  problem  is  known  to  be  NP-complete. 

(a)  Prove  that  the  following  modified  problem  is  NP-easy  and  NP-hard.  Given  a  set  S  of  positive  in- 
tegers and  an  integer  B,  what  is  the  smallest  K  such  that  the  answer  to  the  bin  packing  problem 

is  "yes?" 

(b)  Call  the  solution  to  the  modified  bin  packing  problem  K{S,  B).  Prove  that 

K{S,B)  >  ^Y^s. 

(c)  The  "First  Fit"  algorithm  obtains  an  upper  bound  on  K{S,B).  We  now  describe  it.  Start 
with  an  infinite  sequence  of  boxes  (bins)  B\,  B2,  ■  ■  ■■  Each  box  can  hold  any  number  of  inte- 
gers as  long  as  their  sum  doesn't  exceed  K.  Let  si,  S2, .  .  .  be  some  ordering  of  S.  If  the  Sj's 
are  placed  in  the  -Bj's,  the  nonempty  boxes  form  an  ordered  partition  of  S  and  so  the  num- 
ber of  them  is  an  upper  bound  for  K{S,B).  For  i  =  1,2, . . .  ,\S\,  place  in  the  Bj  with 
the  lowest  index  such  that  it  will  not  make  the  sum  of  the  integers  in  Bj  exceed  K.  Esti- 
mate the  running  time  of  the  algorithm  in  terms  of  15*1,  B  and  the  number  of  bins  actually 
used. 

(d)  Call  the  bound  on  K  obtained  by  the  First  Fit  algorithm  FF{S,B).  Prove  that  FF{S,B)  < 
2K{S,B)  +  1. 

Hint.  When  First  Fit  is  done,  which  bins  can  be  at  most  half  full? 
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APPENDIX  C 


Basic  Probability 


This  appendix  is  a  rapid  introduction  to  the  concepts  from  probabihty  theory  that  are  needed  in 
the  text.  It  is  not  intended  to  substitute  for  a  basic  course  in  probabihty  theory,  which  we  strongly 
recommend  for  anyone  planning  either  to  apply  combinatorics  (especially  in  computer  science)  or  to 
delve  into  combinatorics  for  its  own  sake. 


C.l   Probability  Spaces  and  Random  Variables 


For  simplicity,  we  will  limit  our  attention  to  finite  probability  spaces  and  to  real-valued  random 
variables. 

Definition  C.l  Finite  probability  space  A  finite  probability  space  is  a  finite  set  S 
together  with  a  function  Pr  :  S  ^  M.  such  that 

0  <  Pr(s)  <  1    for  all  sgS       and       ^Pr{s)  =  1. 

We  call  the  elements  of  S  elementary  events  and  the  subsets  of  S  events.  For  T  C  S,  we 
define 

Pr(r)  =  ^Pr(t). 
ieT 

Note  that  Pr(0)  =  0,  Pr(S')  =  1  and  Pr(.s)  =  Pr({s})  for  se  S. 

If  A{s)  is  a  statement  that  makes  sense  for  s  &  S,  we  define  Pr(^)  =  Pr(T),  were  T  is  the 
set  of  allt&S  for  which  A{t)  is  true. 

One  often  has  Pr(.s)  =  1/\S\  for  all  s  e  S.  In  this  case,  Pr(T)  =  \T\/\S\,  the  fraction 
of  elements  in  S  that  lie  in  T.  In  this  case,  we  call  Pr  the  uniform  distribution  on  S. 
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The  terminology  "event"  can  sometimes  be  misleading.  For  example,  if  S  consists  of  all  21'^'  sub- 
sets of  \A\,  an  elementary  event  is  a  subset  of  A.  Suppose  Pr  is  the  uniform  distribution  on  S.  If 
T  is  the  event  consisting  of  all  subsets  of  size  k,  then  Pr(T)  is  the  fraction  of  subsets  of  size  k.  We 
say  that  Pr(T)  is  the  probability  that  a  subset  of  A  chosen  uniformly  at  random  has  size  k.  "Uni- 
formly" is  often  omitted  and  we  simply  say  that  the  probability  a  randomly  chosen  subset  has  size 
k  is  \T\/\S\  =  (1^1)2-1^1.  In  statement  notation, 


Pr(a  subset  has  size  k)  = 


The  notion  of  the  probability  of  a  statement  being  true  is  neither  more  nor  less  general 
than  the  notion  of  the  probability  of  a  subset  of  the  probability  space  S.  To  see  this,  note 
that 

•  with  any  statement  A  we  can  associate  the  set  A  of  elements  of  S  for  which  A  is  true 
while 

•  with  any  subset  T  of  S"  we  can  associate  the  statement  "t  e  T." 

Here  are  some  simple,  useful  properties  of  Pr.  You  should  be  able  to  supply  the  proofs  by  writing 
all  probabilities  as  sums  of  probabilities  of  elementary  events  and  noticing  which  elementary  events 
appear  in  which  sums. 

Theorem  C.l  Suppose  {S,Pv)  is  a  probability  space  and  A,  Ai, . . . ,  A^  and  B  are  subsets 
ofS. 

(a)  IfACB,  then  0  <  Pr(A)  <  Pr(B)  <  1. 

(b)  Pr(5  \  ^)  +  Pr(A)  =  1.  One  also  writes  S  -  A,  A"  and  A'  for  S\A. 

(c)  Pv{A\jB)  =  Pr(A) +  Pr(B) -Pr(AnS). 

(d)  Pr(^i  U  •  •  •  U  Afe)  <  Pr(Ai)  +  ---  +  Pr(Afe). 

We  now  define  a  "random  variable."  It  is  neither  a  variable  nor  random,  rather,  it  is  a  func- 
tion: 


Definition  C.2  Random  variable  Given  a  probabiJitj  space  (5*,  Pr),  a  random  variable 
on  the  space  is  a  function  X  :  5  — >  M. 

People  often  use  capital  letters  near  the  end  of  the  alphabet  to  denote  random  variables.  Why  the 
name  "random  variable"  for  a  function?  The  terminology  arose  historically  before  probability  the- 
ory was  put  on  a  mathematical  foundation.  For  example,  suppose  you  toss  a  coin  10  times  and  let  X 
be  the  number  of  heads.  The  value  of  X  varies  with  the  results  of  your  tosses  and  it  is  random  be- 
cause your  tosses  are  random.  In  probability  theory  terms,  if  the  coin  tosses  are  fair,  we  can  define 
a  probability  space  and  a  random  variable  as  follows. 

•  S  is  the  set  of  all  2^^  possible  10-long  head-tail  sequences, 

•  Pr(s)  =  1/151  =  2-10  =  1/1024, 

•  X{s)  equals  the  number  of  heads  in  the  10-long  sequence  s. 

The  probability  that  you  get  exactly  four  heads  can  be  written  as  Pr(X  =  4),  which  equals  (^°)2-i° 
since  there  are  (^°)  10-long  head-tail  sequences  that  contain  exactly  four  heads. 
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Definition  C.3  Independence  Let  (S*,  Pr)  be  a  probability  space  and  let  X  be  a  set 
of  random  variables  on  {S,Pr).  We  say  that  the  random  variables  in  X  are  mutually 
independent  if,  for  every  subset  {Xi, . . .  ,Xk}  of  X  and  all  real  numbers  xi,...,Xk,  we 
have 

Pr(Xi=a;i  and  •  •  •  and  Xfc=a;fe)  =  Pv{Xi=xi)  ■  ■  ■Pv{Xk=Xk), 

where  the  probabilities  on  the  right  are  multiplied.  We  often  abbreviate  "mutual  independence" 
to  "independence." 

Intuitively,  the  concept  of  independence  means  that  knowing  the  values  of  some  of  the  random 
variables  gives  no  information  about  the  values  of  others.  For  example,  consider  tossing  a  fair  coin 
randomly  ten  times  to  produce  a  10-long  sequence  of  heads  and  tails.  Define 

^   _   f  Ij    if  toss  i  is  heads,  ^  ^ 

'       \  0,    if  toss  i  is  tails. 

Then  the  set  {Xi, . . . ,  Xiq}  of  random  variables  are  mutually  independent. 

We  now  look  at  "product  spaces."  They  arise  in  natural  ways  and  lead  naturally  to  indepen- 
dence. 

Definition  C.4  Product  space  Let  (S'i,Pri), . . . ,  (S'„,Pr„)  be  probability  spaces.  The 
product  space 

(5,Pr)  =        Pn)  X  •••  X  (5„,Pr„)  C.2 
is  defined  by  S  =  5i  x  •  •  •  x  S'„,  a  Cartesian  product,  and 

Pr((ai, . . .  ,a„))  =  Pri(ai)  •  •  •  Pr„(a„)  for  all  {ai, . . .  ,a„)  G  S. 
We  may  write  Pr(ai, . . . ,  a„)  instead  of  Pr((ai, . . . ,  a„)). 

As  an  example,  consider  tossing  a  fair  coin  randomly  ten  times.  The  probability  space  for 
a  single  toss  is  the  set  {H,T},  indicating  heads  and  tails,  with  the  uniform  distribution.  The 
product  of  ten  copies  of  this  space  is  the  probability  space  for  ten  random  tosses  of  a  fair 
coin. 


Theorenn  C.2   Independence  in  product  spaces  Suppose  a  product  space  {S,Pr)  is  given 

by  (C.2).  Let  /i, . . . ,  be  pairwise  disjoint  subsets  of  n;  that  is,  U  n  Ij  =  $  whenever  i  ^  j. 
Suppose  Xk  is  a  random  variable  whose  value  on  (0,1, . . .  ,a„)  depends  only  on  the  values  of 
those  tti  for  which  i  G  Ik-  Then  Xi, . . . ,  Xm  are  mutually  independent. 


We  omit  the  proof. 

Continuing  with  our  coin  toss  example.  Let  li 
the  Xi  are  mutually  independent. 


=  {i}  and  define  Xi  by  (C.l).  By  the  theorem. 
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Expectation  and  Variance 


Definition  C.5  Expectation  Let  (5, Pr)  be  a  probability  space  The  expectation  of  a 
random  variable  X  is 

E(X)  =  ^X(s)Pr(s). 
ses 

The  expectation  is  also  called  the  mean. 

Let  C  M  be  the  set  of  values  taken  on  by  X.  By  collecting  terms  in  the  sum  over  S  according  to 
the  value  of  X{s)  we  can  rewrite  the  definition  as 

E(X)  =  ^rPr(X  =  r). 

In  this  form,  it  should  be  clear  the  expected  value  of  a  random  variable  can  be  thought  of  as  its 
average  value. 

Definition  C.6  Variance  Let  (S,  Pr)  be  a  probabiiity  space  The  variance  of  a  random 
variable  X  is 

var(X)  =  ^(X(.)-E(X)fPr(s)  =  ^(r  -  E(X))^  Pr(X  =  r), 
ses  reR 

where  i?  c  M  is  the  set  of  values  taken  on  by  X. 

The  variance  of  a  random  variable  measures  how  much  it  tends  to  deviate  from  its  expected 
value.  One  might  think  that 

|r-E(X)|  Pr(X  =  r) 

reR 

would  be  a  better  measure;  however,  there  are  computational  and  theoretical  reasons  for  preferring 
the  variance. 

We  often  have  a  random  variable  X  that  takes  on  only  the  values  0  and  1.  Let  p  =  Pr(X=  1). 
You  should  be  able  to  prove  that  E(X)  =  p  and  var{X)  =  p{l  —  p). 

The  following  theorem  can  be  proved  be  algebraic  manipulations  of  the  definitions  of  mean  and 
variance.  Since  this  is  an  appendix,  we  omit  the  proof. 

Theorem  C.3   Properties  of  Mean  and  Variance    Let  Xi, . . .  ,Xk  and  F  be  random 

variables  on  a  probability  space  (S*,  Pr). 

(a)  For  real  numbers  a  and  b,  Ei{aY  +  b)  =  a  E(F)  +  b  and  var(aF  +  b)  =  var(y). 

(b)  var(y)  =  E((y-E(r))2)  =  E(r2)  _  (E(y))^ 

(c)  E(Xi  +  ---  +  Xfc)  =  E(Xi)  +  ---  +  E(Xfe). 

(d)  If  the  Xi  are  independent,  then  var(Xi  +  •  •  •  +  Xk)  =  var(Xi)  +  •  •  •  +  var(Xfe). 

Chebyshev's  Theorem  tells  us  that  it  is  unlikely  that  the  value  of  a  random  variable  will  be  far 
from  its  mean,  where  the  "unit  of  distance"  is  the  square  root  of  the  variance.  (The  square  root  of 
var(X)  is  also  called  the  standard  deviation  of  X  and  is  written  ax-) 
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Theorem  C.4    Chebyshev's  inequality     If  X  is  a  random  variable  on  a  probability  space 

and  t  >  I,  then 


Pr[\X -■E(X)\>t^war{X)j   <  -.  C.3 


For  example,  if  a  fair  coin  is  tossed  n  times  and  X  is  the  number  of  heads,  then  one  can  show 

that 

E(X)  =  n/2  and  var(X)  =  n/4.  C.4 

Chebyshev's  inequahty  tells  us  that  X  is  not  likely  to  be  many  multiples  of  ^/n  from  n/2.  Specifically, 
it  says  that 

Pr(|X-n/2|>(i/2)v^)  <  ^. 

Let's  use  Theorem  C.3  to  prove  (C.4).  Let  be  a  random  variable  which  is  1  if  toss  k  lands 
heads  and  is  0  if  it  lands  tails.  By  the  definition  of  mean, 

E(Ffe)  =  OPr(Ffe  =  0)  +  lPr(Ffe  =  l)  =  0  +  1/2  =  1/2 

and,  by  the  Theorem  C.3(b)  and  the  observation  that      =  Yk, 

var(rfc)  -  E{Y;^)--E{Yk)^  =  I]{Yk)  ~  EiYk)^  =  1/2  -  (1/2)2  ^ 

Notice  that  X  =  Yi  +  Y2  +  ■  ■  ■  +  Yn  and  the  Yfe  are  independent  since  the  coin  is  tossed  randomly. 
By  Theorem  C.3(c)  and  (d),  we  have  (C.4). 


APPENDIX  D 

Partial  Fractions 


We  will  discuss  those  aspects  of  partial  fractions  that  are  most  relevant  in  enumeration.  Although 
not  necessary  for  our  purposes,  the  theoretical  background  of  the  subject  consists  of  two  easily  dis- 
cussed parts,  so  we  include  it.  The  rest  of  this  appendix  is  devoted  to  computational  aspects  of 
partial  fractions. 

Theory 


The  following  result  has  many  applications,  one  of  which  is  to  the  theory  of  partial  fractions. 

Theorem  D.l  Fundamental  Theorem  of  Algebra  If  p{x)  is  a  polynomial  of  degree  n 
whose  coefficients  are  complex  numbers,  then  p{x)  can  be  written  as  a  product  of  linear  factors, 
each  of  which  has  coefRcients  which  are  complex  numbers. 

We  will  not  prove  this. 

In  calculus  classes,  one  usually  uses  a  corollary  of  this  theorem:  If  the  coefficients  of  p{x)  are 
real  numbers,  then  it  is  possible  to  factor  p{x)  into  a  product  of  linear  and  quadratic  factors,  each 
of  which  has  real  coefficients.  We  will  not  use  the  corollary  because  it  is  usually  more  useful  in 
combinatorics  to  write  p{x)  as  a  product  of  linear  factors. 

By  the  Fundamental  Theorem  of  Algebra,  every  polynomial  p{x)  can  be  factored  in  the 
form 

p{x)  =  Ca;"(l-ala;)"Hl-a2a:)"^•••(l-afea;)"^  D.l 

where  the  aj's  are  distinct  nonzero  complex  numbers.  Although  this  can  always  be  done,  it  is,  in 
general,  very  difficult  to  do.  In  (D.l),  the  ai's  are  the  reciprocals  of  the  nonzero  roots  of  p{x)  =  0 
and  rii  is  the  "multiplicity"  of  the  root  1/aj. 

Suppose  that  p{x)  =  pi{x)p2{x)  ■  ■  -pkix)  and  q{x)  are  polynomials  such  that 

•  the  degree  of  q{x)  is  less  than  the  degree  of  p{x); 

•  none  of  the  Pi{x)  is  a  constant; 

•  no  pair  Pi{x)  and  Pj{x)  have  a  common  root. 

The  Chinese  Remainder  Theorem  for  polynomials  asserts  that  there  exist  unique  polynomials 
qi{x), . . . ,  qk{x)  (depending  on  q{x)  and  the  Pi{x))  such  that 
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•  the  degree  of  qi{x)  is  less  than  the  degree  of  Pi{x)  for  all  i; 

•  if  the  coefRcients  of  q{x)  and  the  Pi{x)  are  rational,  then  so  are  the  coefficients  of  the 

q{x)  ^  qijx)  _^  q2{x)  _^  ^  qkjx) 

p{x)      pi{x)     P2{x)  Pk{x)' 

This  is  called  a  partial  fraction  expansion  of  q{x)/p{x).  For  combinatorial  applications,  we  take  the 
Pi{xys  to  be  of  the  form  (1  —  ajx)"'. 

Suppose  we  have  been  able  to  factor  p{x)  as  shown  in  (D.l).  In  our  applications,  we  normally 
have  n  =  0,  so  we  will  assume  this  is  the  case.  We  can  also  easily  remove  the  factor  of  C  by  dividing 
q{x)  by  C.  Let  Pi{x)  =  (1  —  a^x)"' .  With  some  work,  we  can  obtain  a  partial  fraction  expansion  for 
q{x)/p{x).  Finally,  with  a  bit  more  work,  we  can  rewrite  qi{x)/(l  —  aixY^^  as 

1  —  aiX     (1  —  aix)'^  (1  —  aix)'^* ' 

where  the  /3i,j's  are  complex  numbers.  (We  will  not  prove  this.) 

Authors  of  calculus  texts  prefer  a  different  partial  fraction  expansion  for  q{x)/p{x).  In  the 
first  place,  as  we  already  noted,  they  avoid  complex  numbers.  This  can  be  done  by  appropriately 
combining  factors  in  (D.l).  In  the  second  place,  they  usually  prefer  that  the  highest  degree  term  of 
each  factor  have  coefficient  1,  unlike  combinatorialists  who  prefer  that  the  constant  term  be  1. 


Computations 


The  computational  aspect  of  partial  fractions  has  two  parts.  The  first  is  the  factoring  of  a  polynomial 
p{x)  and  the  second  is  obtaining  a  partial  fraction  expansion. 

In  general,  factoring  is  difficult.  The  polynomials  we  deal  with  can  be  factored  by  using  the 
factoring  methods  of  basic  algebra,  including  the  formula  for  the  roots  of  a  quadratic  equation.  The 
latter  is  used  as  follows: 


.  2            ^                  X/         X      ,                      -B  ±        -  AAC 
Ax  +Bx  +  C  =  A{x  —  ri){x  —  r2)    where    ri,r2  =   r-^  . 

We  will  not  review  basic  algebra  methods.  There  is  one  unusual  aspect  to  the  sort  of  factoring  we 

want  to  do  in  connection  with  generating  functions.  We  want  to  factor  p{x)  so  that  it  is  a  product  of 
a  constant  and  factors  of  the  form  1  —  ex.  This  can  be  done  by  factoring  the  polynomial  p(l/y)y", 
where  n  is  the  degree  of  p{x).  The  examples  should  make  this  clearer. 

Example  D.l  A  factorization  Factor  the  polynomial  p{x)  =  1  -  X  -  4:X^.  Following  the 
suggestion,  we  look  at 

r{y)  =  p{l/y)y^  =     -     -  4. 

Since  r(2)  =  0,  y  —  2  must  be  a  factor  of  r{y) .  Dividing  it  out  we  obtain  y^  +  y  +  2.  By  the  quadratic 
formula,  the  zeroes  of  y^  +  y  +  2  are  (— 1  ±  \/^)/2.  Thus 

2)  („-ziWEi)(,-zi^). 

Since  p{x)  =  x^r{l/x),  we  finally  have 

Pix)  =  (l  -  2x)  (l  -  ^^.)  (l  -  ^^^^)  •  □ 
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Example  D.2  Partial  fractions  for  a  general  quadratic  We  want  to  expand  q{x)/p{x)  in 
partial  fractions  when  p{x)  is  a  quadratic  with  distinct  roots  and  q{x)  is  of  degree  one  or  less.  We 
will  assume  that  p{x)  has  been  factored: 

p{x)  =  {1  —  ax){l  —  bx). 

Since  p{x)  has  distinct  roots,  a  ^  b. 

Let  us  expand  l/p{x).  We  can  write 

1  u  V  „ 

D.2 


{l  —  ax){l  —  bx)        1  —  ax     1  — 6x' 

where  u  and  v  are  numbers  that  we  must  find.  (If  a  and  b  arc  real,  then  u  and  v  will  be,  too;  however, 
if  a  or  6  is  complex,  then  u  and  v  could  also  be  complex.)  There  are  various  ways  we  could  find  u 
and  V.  We'll  show  you  two  methods. 

The  straightforward  method  is  to  clear  (D.2)  of  fractions  and  then  equate  powers  of  x.  Here's 
what  happens:  Since 

1  =  ^(1  —  bx)  +  v{l  —  ax)  =  [u  +  v)  —  {bu  +  av)x, 

we  have 

1  =  u  +  V       and       0  =  —  {bu  +  av) . 

The  solution  to  these  equations  is  u  =  a/ {a  —  b)  and  v  =  b/{b  —  a). 

Another  method  for  solving  (D.2)  is  to  multiply  it  by  1  — aa;  and  then  choose  x  so  that  1  — aa;  =  0; 
i.e.,  x  =  1/a.  After  the  first  step  we  have 

1                 v(l  —  ax)  ^  „ 

-  ii+  -i  — '-.  D.3 


1  —  bx  1  ~  bx 

When  we  set  1  —  ax  =  0,  the  last  term  in  (D.3)  disappears — that's  why  we  chose  that  value  for  x. 
Substituting  in  (D.3),  we  get 

1  a 
1  —  b/a     a  —  b 

By  the  symmetry  of  (D.2),  v  is  obtained  simply  by  interchanging  a  and  b. 
We  have  shown  by  two  methods  that 


1 


b 

b-a 


{1  —  ax){l  —  bx)        1  —  ax     1  —  bx 
By  either  of  these  methods,  one  can  show  that 


D.4 


1  1 

i~b       ,       b  —  a 


(1  —  ax)  (1  —  bx)        1  —  ax     I  —  bx 


D.5 


We  leave  this  as  an  exercise.  You  can  save  yourself  quite  a  bit  of  work  in  partial  fraction  calculations 
if  you  use  (D.4)  and  (D.5).  □ 
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Example  D.3  A  specific  quadratic  Let's  expand 

1  +  3x 


(l  +  2a;)(l-3a;) 

by  partial  fractions.  Using  (D.4)  and  (D.5)  with  o  =  — 2  and  6  =  3,  we  easily  obtain 

l  +  3a;  1  _^  3a; 


(l  +  2a;)(l -3a;)        (1  +  2a:)(l  -  3a;)     (1  +  2a;)(l  -  3a;) 

2/5         3/5        -3/5  3/5 
~  l  +  2x     1  -  3a;     1  +  2a;     1  -  3x 


1  +  2a;     1  -  3a; 

To  see  how  much  effort  has  been  saved,  you  are  encouraged  to  derive  this  result  without  using  {D.4 
and  (D.5).  □ 

Example  D.4  A  factored  cubic    Let's  expand 

l  +  3a; 


(l-x)(l  +  2a;)(l-3a;) 

by  partial  fractions.  We  begin  by  factoring  out  1  —  a;  and  using  (D.6).  Next  we  use  some  algebra  and 
then  apply  (D.4)  twice.  Here  it  is. 

1  +  3a;  _      1     f  -1/5  _^  6/5 


(1  -  a;)(l  +  2a;)(l  -  3a;)        1  -  a;  V 1  +  2a;     1  -  3a;, 

-1/5  6/5 


(l-a;)(l  +  2x)      (1 -a;)(l -3a;) 

(-l/5)(l/3)     (-l/5)(2/3)     (6/5)(-l/2)  (6/5)(3/2) 

1-a;              l  +  2a;             1-a;  l-3a; 
-2/3  ^  -2/15  ^  9/5 


1-a;     l  +  2a;  l-3a; 

Notice  how  we  were  able  to  deal  with  the  cubic  denominator  by  iterating  the  method  for  dealing 
with  a  quadratic  denominator.  This  will  work  in  any  situation  as  long  as  the  denominator  has  no 

repeated  factors.  Q 


Example  D.5  A  squared  quadratic    Let's  expand 

1  +  3a; 


(l  +  2x)2(l  -3x)2' 

Before  tackling  this  problem,  let's  do  the  simpler  case  where  the  numerator  is  one; 


(l  +  2a;)2(l-3a;)2        V  (1  +  2a;)(l  -  3a;) 

Using  (D.4),  this  becomes 

2/5  3/5 


D.7 


D.8 


which  can  be  expanded  to 


1  +  2a;     1  -  3a; 


4/25     _^     9/25      ,  12/25 


(l  +  2a;)2      (l-3a;)2      (1  +  2a;)(l  -  3a;) ' 

The  first  two  terms  are  already  in  standard  partial  fraction  form  and  you  should  have  no  trouble 
expanding  the  last. 
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How  does  this  help  us  with  (D.7)  since  we  stiU  have  a  3a;  left  in  the  numerator?  We  can  cheat  a 
little  bit  and  allow  ourselves  to  write  the  expansion  of  (D.7)  as  simply  1  +  3a;  times  the  expansion  of 
(D.8).  This  docs  not  cause  any  problems  in  the  applications  of  partial  fractions  that  we  are  interested 
in  other  than  slightly  more  complicated  answers.  D 

Example  D.6  Another  problem  How  can  we  expand  1/(1  -  a;)^(l  -  2x)  by  partial  fractions? 
Do  any  of  our  earlier  tricks  work?  Yes,  we  simply  write 


(1  -  a;)2(l  -  2a;)        1  -  a;  V  (1  -  a:)(l  -  2x) 
and  continue  as  in  Example  D.4.  Q 

As  you  should  have  seen  by  now,  a  bit  of  cleverness  with  partial  fractions  can  save  quite  a  bit 
of  work.  Another  trick  worth  remembering  is  to  use  letters  in  place  of  complicated  numbers.  We 
conclude  with  an  example  which  illustrates  another  trick. 


^Example  D.7   More  tricks  Expand 

X  +  Qx^ 


4x^ 


D.9 


in  partial  fractions. 

We'll  write  the  denominator  as 


(l-2a;)(l  +  a;  +  2a;2)  =  (1  -  2a;)(l  -  ca;)(l  -  da;) 

where 

c  =    and       a  =   , 

2  2 

a  factorization  found  in  Example  D.l. 

There  are  various  ways  we  can  attack  this  problem.  Let's  think  about  this. 

•  An  obvious  approach  is  to  use  the  method  of  Example  D.4.  The  6a;^  causes  a  bit  of  a  problem 
in  the  numerator  because  the  first  step  in  expanding 

X  + 
1  +  X  +  2x2 

is  to  divide  so  that  the  numerator  is  left  as  a  lower  degree  than  the  denominator.  If  we  take  this 
approach,  we  should  carry  along  c  and  d  as  much  as  possible  rather  than  their  rather  messy 

values.  Also,  since  they  arc  zeros  oi  y"^  +  y  +  2,  we  have  the  equations  =  — c  — 2  and  cP  =  — 2, 
which  may  help  simplify  some  expressions.  Also,  cd  =  2  and  c  +  d  —  —1.  We  leave  this  approach 
for  you  to  carry  out. 

•  Wc  could  use  the  previous  idea  after  first  removing  the  factor  of  x  +  Gx^  and  then  reintroducing 
it  at  the  end.  The  justification  for  this  is  the  last  paragraph  of  Example  D.5. 

•  Another  approach  is  to  write  x  +  6x^  =  x(l  +  6x).  We  can  obtain  partial  fractions  for 
(1  +  6x)/(l  +  X  +  x^)  using  (D.4)  and  (D.5).  This  result  can  be  multiplied  by  x/(l  —  2x)  and 
expanded  by  (D.5). 

•  A  different  approach  is  to  remove  the  1  —  2.x  partial  fraction  term  from  (D.9),  leaving  a  quadratic 
denominator.  The  resulting  problem  can  then  be  done  by  our  trusty  formulas  (D.4)  and  (D.5). 
Let's  do  this.  We  have 

X  +  6x^  u  V  w  ^  ,  „ 

+   +  :;  ;-•  D.IO 


(l-2x)(l  +  x  +  2x2)        l-2x     1-cx  1-dx' 
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Applying  the  trick  of  multiplying  by  1  —  2x  and  setting  1  —  2a;  =  0,  we  have 

X  +  ^x^  _  ^ 

1/2 


1  +  X  +  2x2 

Subtracting         —  2x)  from  both  sides  of  (D.IO),  wc  obtain 
V     ^     w  X  +  6x2 


1-cx     \-dx        (l-2x)(l  +  x  +  2x2)  l-2x 

-l  +  4x^ 
(l-2x)(l+x  +  2x2) 
-1  -  2x 


1  +  X  +  2x2  ■ 

The  last  of  these  can  now  be  expanded  by  (D.4)  and  (D.5).  The  cancellation  of  a  factor  of  1  —  2x 
from  the  numerator  and  denominator  was  not  hick.  It  had  to  happen  if  our  algebra  was  correct 
because  we  were  removing  the  1  —  2x  partial  fraction  term.  Q 

Since  computing  partial  fractions  can  involve  a  lot  of  algebra,  it  is  useful  to  have  an  algebra 
package  do  the  computations.  If  that's  not  feasible,  it's  a  good  idea  to  check  your  calculations.  This 
can  be  done  in  various  ways 

•  Use  a  graphing  calculator  to  plot  the  partial  fraction  expansion  and  the  original  fraction  and 
see  if  they  agree. 

•  Reverse  the  procedure:  combine  the  partial  fractions  into  one  fraction  and  see  if  the  result  equals 
the  original  fraction. 

•  Compute  several  values  to  see  if  the  value  of  the  partial  fraction  expansion  agrees  with  the 
value  of  the  original  fraction.  For  p{x)/q{x),  it  suffices  to  compute  max(deg(g'(x)),  deg(p(x))+l) 

values. 


If  you  wish  to  practice  working  problems  involving  partial  fractions,  look  in  the  partial  fractions 
section  of  any  complete  calculus  text. 
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Section  1.1 

1.1.1.  We  can  form  n  digit  numbers  by  choosing  the  leftmost  digit  AND  choosing  the  next  digit 

AND  •  •  •  AND  choosing  the  rightmost  digit.  The  first  choice  can  be  made  in  9  ways  since  a  leading 
zero  is  not  allowed.  The  remaining  n  —  1  choices  can  each  be  made  in  10  ways.  By  the  Rule  of 
Product  we  have  9  x  10"~^. 

To  count  numbers  with  at  most  n  digits,  we  could  sum  up  9  x  lO'^"'^  for  1  <  k  <  n.  The  sum 
can  be  evaluated  since  it  is  a  geometric  series.  This  does  not  include  the  number  0.  Whether  we 
add  1  to  include  it  depends  on  our  interpretation  of  the  problem's  requirement  that  there  be  no 
leading  zeroes.  There  is  an  easier  way.  We  can  pad  out  a  number  with  less  than  n  digits  by  adding 
leading  zeroes.  The  original  number  can  be  recovered  from  any  such  n  digit  number  by  stripping  off 
the  leading  zeroes.  Thus  we  see  by  the  Rule  of  Product  that  there  are  10"  numbers  with  at  most  n 
digits.  If  we  wish  to  rule  out  0  (which  pads  out  to  a  string  of  n  zeroes),  we  must  subtract  1. 

1.1.3.  List  the  elements  of  the  set  in  any  order:  ai,  02,  •  •  • ,        We  can  construct  a  subset  by 
including  ai  or  not  AND 
including  02  or  not  AND 

including  a\s\  or  not. 

Since  there  are  2  choices  in  each  case,  the  Rule  of  Product  gives  2  x  2  x  •  •  •  x  2  =  2l'^l. 

1.1.5.  The  answers  are  SISITS  and  SISLAL.  We'll  come  back  to  this  type  of  problem  when  we  study 
decision  trees. 


Section  1.2 

1.2.1.  If  we  want  all  assignments  of  birthdays  to  people,  then  repeats  are  allowed  in  the  list  men- 
tioned in  the  hint.  This  gives  365^°.  If  we  want  all  birthdays  distinct,  no  repeats  are  allowed  in  the 
list.  This  gives  365  x  364  x  •  •  •  x  (365  -  29).  The  ratio  is  0.29.  How  can  this  be  computed?  There  are 
a  lot  of  possibilities.  Here  are  some. 

•  Use  a  symbolic  math  package. 

•  Write  a  computer  program. 

•  Use  a  calculator.  Overflow  may  be  a  problem,  so  you  might  write  the  ratio  as 
(365/365)  X  (364/365)  x  •  •  •  x  (336/365). 

•  Use  (1.2).  You  are  asked  to  do  this  in  the  next  problem.  Unfortunately,  there  is  no  guarantee 
how  large  the  error  will  be. 

•  Use  Stirling's  formula  after  writing  the  numerator  as  3651/335!.  Since  Stirling's  formula  has  an 
error  guarantee,  we  know  we  are  close  enough.  Computing  the  values  directly  from  Stirling's 
formula  may  cause  overflow.  This  can  be  avoided  in  various  ways.  One  is  to  rearrange  the 
various  factors  by  using  some  algebra: 

V^m^/ef''        ^  ./3657335f365/335)33Ve3° 
\/2^f335(335/e)335(365)30        ^      '      \     i      )     i  ■ 

Another  way  is  to  compute  the  logarithm  of  Stirling's  formula  and  use  that  to  estimate  the 
logarithm  of  the  answer. 

1.2.3.  Each  of  the  7  letters  ABMNRST  appears  once  and  each  of  the  letters  CIO  appears  twice. 
Thus  we  must  form  an  ordered  list  from  the  10  distinct  letters.  The  solutions  axe 


k  =  2 
A;  =  4 


10  x  9  =  90 

10  X  9  X  8  =  720 
10  x9x8x7  =  5040 
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1.2.5  (a)   Since  there  are  5  distinct  letters,  the  answer  is  5  x  4  x  3  =  60. 

(b)  Since  there  are  5  distinct  letters,  the  answer  is  5^  =  125. 

(c)  Either  the  letters  are  distinct  OR  one  letter  appears  twice  OR  one  letter  appears  three  times. 

Wc  have  seen  that  the  first  can  be  done  in  60  ways.  To  do  the  second,  choose  one  of  L  and  T 
to  repeat,  choose  one  of  the  remaining  4  different  letters  and  choose  where  that  letter  is  to  go, 
giving  2  X  4  X  3  =  24.  To  do  the  third,  use  T.  Thus,  the  answer  is  60  +  24  +  1  =  85. 

1.2.7  (a)    push,  push,  pop,  pop,  push,  push,  pop,  push,  pop,  pop.  Remembering  to  start  with 
something,  say  a  on  the  stack:  {a{bc)){{de)f). 

(b)  This  is  almost  the  same  as  (a).  The  sequence  is  112211212122  and  the  last  "pop"  in  (a)  is 
replaced  by  "push,  pop,  pop." 

(c)  a{{b{{cd)e)){fg));    push,  push,  push,  pop,  push,  pop,  pop,  push,  push,  pop,  pop,  pop; 
111010011000. 

1.2.9.  Stripping  off  the  initial  R  and  terminal  F,  we  are  left  with  a  list  of  at  most  4  letters,  at  least 

one  of  which  is  an  L.  There  is  just  1  such  Hst  of  length  1.  There  are  3^  —  2^  =  5  Hsts  of  length  2, 
namely  all  those  made  from  E,  I  and  L  minus  those  made  from  just  E  and  I.  Similarly,  there  are 
33  _  2^  =  19  of  length  3  and  3^  -  2^  =  65.  This  gives  us  a  total  of  90. 

The  letters  used  are  E,  F,  I,  L  and  R  in  alphabetical  order.  To  get  the  word  before  RELIEF, 
note  that  we  cannot  change  just  the  F  and/or  the  E  to  produce  an  earlier  word.  Thus  we  must 
change  the  I  to  get  the  preceding  word.  The  first  candidate  in  alphabetical  order  is  F,  giving  us 
RELF.  Working  backwards  in  this  manner,  we  come  to  RELELF,  RELEIF,  RELEF  and,  finally, 
RELEEF. 

1.2.11.  There  are  n!/(n  —  fc)!  lists  of  length  k.  The  total  number  of  lists  (not  counting  the  empty 
list)  is 

n!  n!  n!     n!         ,  / 1      1  1     \         ,  v^^  1' 

i=0 


(n-1)!     (n-2)!  1!      0!         '  VO!     1!  (n-1)! 


Since  e  =       =  X^^g  ^V*''     follows  that  the  above  sum  is  close  to  e. 
1.2.13.  We  can  only  do  parts  (a)  and  (d)  at  present. 

(a)    A  person  can  run  for  one  of  k  offices  or  for  nothing,  giving  A;  +  1  choices  per  person.  By  the 
Rule  of  Product  we  get  (k  +  1)^. 


(d)   We  can  treat  each  office  separately.  There  are  2^  —  1  possible  slates  for  an  office:  any  subset  of 
the  set  of  candidates  except  the  empty  one.  By  the  Rule  of  Product  we  have  (2*"  —  1)'^. 
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Section  1.3 

1.3.1.  After  recognizing  that  k  =  nX  and  n  —  k  =  n(l  —  A),  it's  simply  a  matter  of  algebra. 

1.3.3.  Choose  values  for  pairs  AND  choose  suits  for  the  lowest  value  pair  AND  choose  suits  for  the 
middle  value  pair  AND  choose  suits  for  the  highest  value  pair.  This  gives  (3^)  (2)   =  61, 776. 

1.3.5.  Choose  the  lowest  value  in  the  straight  (A  to  10)  AND  choose  a  suit  for  each  of  the  5  values 
in  the  straight.  This  gives  10  x  4^  =  10240. 

Although  the  previous  answer  is  acceptable,  a  poker  player  may  object  since  a  "straight  flush" 
is  better  than  a  straight — and  we  included  straight  flushes  in  our  count.  Since  a  straight  flush  is  a 
straight  all  in  the  same  suit,  we  only  have  4  choices  of  suits  for  the  cards  instead  of  4^.  Thus,  there 
are  10  x  4  =  40  straight  flushes.  Hence,  the  number  of  straights  which  are  not  straight  flushes  is 
10240  -  40  =  10200. 

1.3.7.  This  is  like  Exercise  1.2.3,  but  we'll  do  it  a  bit  differently  Note  that  EXERCISES  contains 
3  E's,  2  S's  and  1  each  of  C,  L  R  and  X.  By  the  end  of  Example  1.18,  we  can  use  (1.4)  with  =  9, 
mi  =  3,  m2  =  2  and  TO3  =  7714  =  tos  =  tuq  =  1.  This  gives  9!/3!  2!  =  30240. 

It  can  also  be  done  without  the  use  of  a  multinomial  coefficient  as  follows.  Choose  3  of  the 
9  possible  positions  to  use  for  the  three  E's  AND  choose  2  of  the  6  remaining  positions  to  use  for 
the  two  S's  AND  put  a  permutation  of  the  remaining  4  letters  in  the  remaining  4  places.  This  gives 
us  ©x©x4!. 

The  number  of  eight  letter  arrangements  is  the  same.  To  see  this,  consider  a  9-list  with  the 
ninth  position  labeled  "unused." 

1.3.9.  Think  of  the  teams  as  labeled  and  suppose  Teams  1  and  2  each  contain  3  men.  We  can  divide 
the  men  up  in  (3  3      1)  ways  and  the  women  in  (2  2  3^3  i)  ways. 

We  must  now  count  the  number  of  ways  to  form  the  ordered  situation  from  the  unordered  one. 
Be  careful — it's  not  4!  x  2  as  it  was  in  the  example!  Thinking  as  in  the  early  card  example,  we  start 
out  two  types  of  teams,  say  M  or  F  depending  on  which  sex  predominates  in  the  team.  We  also  have 
two  types  of  referees.  Thus  we  have  two  M  teams,  two  F  teams,  and  one  each  of  an  F  referee  and 
an  M  referee.  We  can  order  the  two  M  teams  (2  ways)  and  the  two  F  teams  (2  ways),  so  there  are 
only  2x2  ways  to  order  and  so  the  answer  is  (3  3  2^2  1)^  J' 

1.3.11.  The  theorem  is  true  when  /c  =  2  by  the  binomial  theorem  with  x  =  yi  and  y  =  j/2-  Suppose 
that  k  >  2  and  that  the  theorem  is  true  for  A;  —  1.  Using  the  hint  and  the  binomial  theorem  with 
X  =  yk  and  y  =  t/i  +  2/2  H  h  Vk-i,  we  have  that 

(yi  +  y2  +  •  •  •  +  VkT  =      (")  (2/1  +  2/2  +  •  •  •  +  yk-iT~'yi- 

Thus  the  coefficient  of  J/™^  •  •  •  2/™''  in  this  is  [^J  =  n\/{n  —  mfc)!mfc!  times  the  coefficient  of 

vT^  ■  ■  ■  Vk-i^  ill  (?yi  +  2/2  +  •  •  •  +  2/fe-i)""'"''-  When  n  —  ruk  =  mi  +  m^A  +  mk-i  the  coefficient 

is  (n  —  TOfc)!/mi!rH2!  ■  ■  ■  '"t-i!  fUid  otherwise  it  is  zero  by  the  induction  assumption.  Multiplying  by 
( J!'  ) )  we  obtain  the  theorem  for  k. 
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Section  1.4 

1.4.1.  The  rows  are  1,7,21,35,35,7,1  and  1,8,28,56,70,56,28,8,1. 

1.4.3.  Let  L{n,  k)  be  the  number  of  ordered  fc-Usts  without  repeats  that  can  be  made  from  an  n-set 
S.  Form  such  a  list  by  choosing  the  first  element  AND  then  forming  a  A;  —  1  long  list  using  the 
remaining  n  —  1  elements.  This  gives  L{n,  k)  =  nL{n  —  l,k  —  1). 

Single  out  one  item  x  G  S.  There  are  L{n  —l,k)  lists  not  containing  x.  If  x  is  in  the  list,  it  can 
be  in  any  of  k  positions  AND  the  rest  of  the  list  can  be  constructed  in  L{n  —  1,  fc  —  1)  ways.  Thus 

L{n,k)  =  L{n-l,k)  +  kL{n-l,k-l). 

1.4.5.  The  only  way  to  partition  an  n  element  set  into  n  blocks  is  to  put  each  element  in  a  block 
by  itself,  so  S{n,  n)  =  1.  The  only  way  to  partition  an  n  element  set  into  one  block  is  to  put  all  the 
elements  in  the  block,  so  S'(n,  1)  =  1. 

The  only  way  to  partition  an  n  element  set  into  n  —  1  blocks  is  to  choose  two  elements  to  be  in 
a  block  together  an  put  the  remaining  n  —  2  elements  in  n  —  2  blocks  by  themselves.  Thus  it  suffices 
to  choose  the  2  elements  that  appear  in  a  block  together  and  so  S{n,n  —  1)  =  (2). 

The  formula  for  S{n,n  —  1)  can  also  be  proved  using  (1.9)  and  induction.  The  formula  is  correct 
for  n  =  1  since  there  is  no  way  to  partition  a  1-set  and  have  no  blocks.  Assume  true  for  n  —  1.  Use 
the  recursion,  the  formula  for  S{n  —  l,n  —  1)  and  the  induction  assumption  for  S{n  —  l,n  —  2)  to 
obtain 

S{n,n-1)  =  5(n-  l,n-  2)  +  {n  -  l)S{n  -  l,n  -  1)  =  ^~^^+  (n-  1)1  = 

which  completes  the  proof. 

Now  for  S{n,  2).  Note  that  S{n,  k)  is  the  number  of  unordered  lists  of  length  k  where  the  list 
entries  are  nonempty  subsets  of  a  given  n-set  and  each  element  of  the  set  appears  in  exactly  one 
list  entry.  We  will  count  ordered  lists,  which  is  fe!  times  the  number  of  unordered  ones.  We  choose  a 
subset  for  the  first  block  (first  list  entry)  and  use  the  remaining  set  elements  for  the  second  block. 
Since  an  n-set  has  2",  this  would  seem  to  give  2"/2;  however,  we  must  avoid  empty  blocks.  In  the 
ordered  case,  there  are  two  ways  this  could  happen  since  either  the  first  or  second  list  entry  could 
be  the  empty  set.  Thus,  we  must  have  2"  —  2  instead  of  2". 

Here  is  another  way  to  compute  S(n,  2).  Look  at  the  block  containing  n.  Once  it  is  determined, 
the  entire  two  block  partition  is  determined.  The  block  one  of  the  2"~^  subsets  of  n  —  1  with  n 
adjoined.  Since  something  must  be  left  to  form  the  second  block,  the  subset  cannot  be  all  of  n  —  1. 
Thus  there  arc  2"^^  —  1  ways  to  form  the  block  containing  n. 

The  formula  for  S{n,  2)  can  also  be  proved  by  induction  using  the  recursion  for  S{n,  k)  and 
the  fact  that  S{n,  1)  =  1,  much  as  was  done  for  S{n,n—  1). 

1.4.7.  There  are  ('^')  ways  to  choose  the  subset  AND  k  ways  to  choose  an  element  in  it  to  mark. 
This  gives  the  left  side  of  the  recursion  times  k.  On  the  other  hand,  there  are  n  ways  to  choose 
an  element  to  mark  from  {1,2,  ...,n}  AND  {^ZD  ways  to  choose  the  remaining  elements  of  the 
fc-element  subset. 

1.4.9  (b)   Each  office  is  associated  with  a  nonempty  subset  of  the  people  and  each  person  must  be 

in  exactly  one  subset.  This  is  a  partition  of  the  set  of  candidates  with  each  block  corresponding 
to  an  office.  Thus  we  have  an  ordered  partition  of  a  n  element  set  into  k  blocks.  The  answer  is 
k\S{n,  k). 

(c)  This  is  hke  the  previous  part,  except  that  some  people  may  be  missing.  We  use  two  methods. 
First,  let  i  people  run  for  no  offices.  The  remaining  n  —  i  can  be  partitioned  in  S(n  —  i,  k)  ways 
and  the  blocks  ordered  in  A;!  ways.  Thus  we  get  Ylii>o  (")^''^('^  ~  *>      For  the  second  method. 
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either  everyone  runs  for  an  office,  giving  k\S{n,k)  or  some  people  do  not  run.  In  the  latter 
case,  we  can  think  of  a  partition  with  fc  +  1  labeled  blocks  where  the  labels  are  the  k  offices 
and  "not  running."  This  give  {k  +  l)\S{n,  k  +  1).  Thus  we  have  k\S{n,  k)  +  {k+  l)!5'(n.  A:  +  1). 
The  last  formula  is  preferable  since  it  is  easier  to  calculate  from  tables  of  Stirling  numbers. 

(e)  Let  T{p,  k)  be  the  number  of  solutions.  Look  at  all  the  people  running  for  the  first  fc  —  1  offices. 
Let  t  be  the  number  of  these  people.  \i  t  <  p,  then  at  least  p  —  t  people  must  be  running 
for  the  fcth  office  since  everyone  must  run  for  some  office.  In  addition,  any  of  these  t  people 
could  run  for  the  fcth  office.  By  the  Rule  of  Product,  the  number  of  ways  we  can  have  this 
particular  set  of  t  people  running  for  the  first  fc  —  1  offices  and  some  people  running  for  the  fcth 
office  is  T{t,  fc  —  1)2*.  The  set  of  t  people  can  be  chosen  in  (j)  ways.  Finally,  look  at  the  case 
t  =  p.  In  this  case  everyone  is  running  for  one  of  the  first  fc  —  1  offices.  The  only  restriction  we 
must  impose  is  that  a  nonempty  set  of  candidates  must  run  for  the  fcth  office.  Putting  all  this 
together,  we  obtain 

T{p,k)  =  ^Qr(t,fc-l)2*  +  T(p,fc-l)(2f-l). 

This  recursion  is  valid  for  p>  2  and  fc  >  2.  The  initial  conditions  are  T{p,  1)  =  1  for  p  >  0  and 
T(l,fc)  =  1  for  fc  >  0. 

Notice  that  if  "people"  and  "offices"  are  interchanged,  the  problem  is  not  changed.  Thus 
T{p,  fc)  =  T{k,p)  and  a  recursion  could  have  been  obtained  by  looking  at  offices  that  the  first 
p  —  1  people  run  for.  This  would  give  us 

T{p,k)  =  ^Q^T(p-l,f)2*  +  T(p-l,fc)(2'=-l). 

Section  1.5 

1.5.1.  For  each  element,  there  are  j  +  1  choices  for  the  number  of  repetitions,  namely  anything  from 
0  to  j,  inclusive.  By  the  Rule  of  Product,  we  obtain  (j  +  1)1'^!. 

1.5.3.  To  form  an  unordered  list  of  length  fc  with  repeats  from  {1,  2, . . . ,  n},  either  form  a  list  with- 
out n  OR  form  a  list  with  n.  The  first  can  be  done  in  M{n  —  1,  fc)  ways.  The  second  can  be  done 
by  forming  a  fc  —  1  element  list  AND  then  adjoining  n  to  it.  This  can  be  done  in  M(n,  fc  —  1)  x  1 
ways.  Initial  conditions:  M(n,  0)  =  1  for  n  >  0  and  M(0,  fc)  =  0  for  fc  >  0. 

1.5.5.  Interpret  the  points  between  the  ith  and  the  {i  +  l)st  vertical  bars  as  the  balls  in  box  i.  Since 
there  are  n  +  1  bars,  there  are  n  boxes.  Since  there  are  (n  +  fc  —  1)  —  (n  —  1)  =  fc  points,  there  are 
fc  balls. 

1.5.7.  This  exercise  and  the  previous  one  are  simply  two  different  ways  of  looking  at  the  same  thing 

since  an  unordered  list  with  repetitions  allowed  is  the  same  as  a  multiset.  The  nth  item  must  appear 
zero,  one  OR  two  times.  The  remaining  n  —  1  items  must  be  used  to  form  a  list  of  length  fc,  fc  —  1 
or  fc  —  2  respectively.  This  gives  the  three  terms  on  the  left.  We  generalize  to  the  case  where  each 
item  is  used  at  most  j  times:  T(n,  fc)  =  J2i=o  -^('^  —  1,  fc  —  i). 

1.5.9  (a)    We  give  two  solutions.  Both  use  the  idea  of  inserting  a  ball  into  a  tube  in  an  arbitrary 
position.  To  physically  do  this  may  require  some  manipulation  of  balls  already  in  the  tube. 
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1.  Insert  b  —  1  balls  into  the  tubes  AND  then  insert  the  6th  ball.  There  axe  i  +  1  possible 

places  to  insert  this  ball  in  a  tube  containing  i  balls.  Summing  this  over  all  t  tubes  gives 
us  {b  —  1)  +t  possible  places  to  insert  the  6th  ball.  We  have  proved  that 

f{b,t)  =  f{b-l,t){b  +  t-l). 

Since  =  t,  we  can  establish  the  formula  by  induction. 

2.  Alternatively,  we  c;an  insert  the  first  ball  AND  insert  the  remaining  6  —  1  balls.  The  first 
ball  has  the  effect  of  dividing  the  tube  in  which  it  is  placed  into  two  tubes:  the  part  above 
it  and  the  part  below.  Thus 


f{b,t)  =  tf{b-l,t+l), 

and  we  can  again  use  induction. 

(b)   We  give  two  solutions: 

Construct  a  list  of  length  t  +  b  —  1  containing  each  ball  exactly  once  and  containing  t—1 
copies  of  "between  tubes."  This  can  be  done  in  (*|^^^)6!  ways — choose  the  "between  tubes" 
and  then  permute  the  balls  to  place  them  in  the  remaining  6  positions  in  the  list. 

Alternatively,  imagine  an  ordered  b  +  t—1  long  list.  Choose  t—1  positions  to  be  divisions 
between  tubes  AND  choose  how  to  place  the  6  balls  in  the  remaining  6  positions.  This  gives 

CV--,')  X  bi- 


section 2.2 

2.2.3.  The  interchanges  can  be  written  as  (1,3),  (1,4)  and  (2,3).  Thus  the  entire  set  gives  1  — >  3  — >  2, 
2— >3,  3— >1— >4  and  4  — >  1.  In  cycle  form  this  is  (1,2,3,4).  Thus  five  applications  takes  1  to  2. 

2.2.5  (a)  This  was  done  in  Exercise  2.2.2,  but  we'll  redo  it.  If  f{k)  =  k,  then  the  elements  of  n—  {k} 
can  be  permuted  in  any  fashion.  This  can  be  done  in  (n—1)!.  Since  there  are  n!  permutations, 
the  probability  that  f{k)  =  fc  is  (n  —  l)!/n!  =  1/n.  Hence  the  probability  that  f{k)  ^  k  is 
1  -  1/n. 

(b)  By  the  independence  assumption,  the  probability  that  there  are  no  fixed  points  is  (1  —  l/ri)". 
One  of  the  standard  results  in  calculus  is  that  this  approaches  1/e  as  n  ^  oo.  (You  can  prove 
it  by  writing  (1  —  1/n)"  =  exp(ln(l  —  1/n) /(1/n)),  setting  l/n  =  x  and  using  rHopital's  Rule.) 

(c)  Choose  the  k  fixed  points  AND  construct  a  derangement  of  the  remaining  n—k.  This  gives  us 

(fe)-Dn-fe-  Now  use  Dn-k  ^  {n-k)\/e. 

2.2.7.  For  1  <  fc  <  n  —  1,  E(|afe  —  ak+i\)  =  —  where  the  latter  expectation  is  taken  over  all 
i  ^  j  in  n.  Thus  the  answer  is  (n  —  1)  times  the  average  of  the  n(n  —  1)  values  of  |?  —  j|  and  so 


answer  = 


-J- — =  ~r — Tt1^I-?  ~*I  =  ~     Y1    0'-^)'        proving  (a) 

n(n  —  1)  ^-^  nin  —  1  ^  n  ^ 

ii^i  hj  i<i<j<n 

j=i 1=1  j=i  ^  '  j=i 

1  /n(n+ l)(2n  + 1)     n(n+l)\        n^  -  1 


n 
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Section  2.3 

2.3.3.  We  can  form  the  permutations  of  the  desired  type  by  first  constructing  a  partition  of  n 
counted  by  B{n,  b)  AND  then  forming  a  cycle  from  each  block  of  the  partition.  The  argument  used 
in  Exercise  2.2.2  proves  that  there  are  (fc  —  1)!  cycles  of  length  k  that  can  be  made  from  a  fc-set. 

2.3.5  (a)   In  the  order  given,  they  are  2,  1,  3  and  4 

(b)  If  /  is  associated  with  a  B  partition  of  n,  then  B  is  the  coimage  of  /  and  so  /  determines  B. 

(c)  See  (b). 

(d)  The  first  is  not  since  /(I)  =  2  7^  1. 

The  second  is:  just  check  the  conditions. 

The  third  is  not  since /(4)  -  1  =  2  >  max(/(l), /(2), /(3))  =  1. 
The  fourth  is:  just  check  the  conditions. 

(e)  In  a  way,  this  is  obvious,  but  it  is  tedious  to  write  out  a  proof.  By  definition  /(I)  =  1.  Choose 

fc  >  1  such  that  /(x)  =  k  for  some  x.  Let  y  be  the  least  clement  of  n  for  which  /(y)  =  k.  By 
the  way  /  is  constructed,  y  is  not  in  the  same  block  with  any  t  <  y.  Thus  y  is  the  smallest 
element  in  its  block  and  so  f{y)  will  be  the  smallest  number  exceeding  all  the  values  that  have 
been  assigned  for  f{t)  with  t  <  y.  Thus  the  maximum  of  f{t)  over  t  <  y  is  k  —  1  and  so  /  is  a 

restricted  growth  function. 

(f)  The  functions  are  given  in  one-line  form  and  the  partition  below  them 


1111 
{1,2,3,4} 

12  11 
{1,3,4}{2} 

12  2  3 
{1}{2,3}{4} 


1112 
{1,2, 3}  {5} 

12  12 
{1,3}  {2,4} 

12  3  1 
{1,4}  {2}  {3} 


112  1 
{1,2, 4}  {3} 

12  13 
{1,3}  {2}  {4} 

12  3  2 
{1}{2,4}{3} 


112  2 
{1,2}  {3, 4} 

12  2  1 
{1,4}  {2, 3} 

12  3  3 
{1}{2}{3,4} 


112  3 
{1,2} {3} {4} 

12  2  2 
{1}{2,3,4} 

12  3  4 
{1}{2}{3}{4} 


2.3.7.  The  coimage  is  a  partition  of  A  into  at  most  \B\  blocks,  so  our  bound  is  1  +  {\A\  —  l)/|i?|. 

2.3.9.  U  s  <  t  and  f{s)  =  f{t),  that  tells  us  that  we  cannot  put  at  the  start  of  the  longest 
decreasing  subsequence  starting  with  at  to  obtain  a  decreasing  subsequence.  (If  we  could,  we'd  have 
/(s)  >  f{t)  +  1.)  Thus,  Us  >  at-  Hence  the  subsequence  ai,aj,...  constructed  in  the  problem  is 
increasing. 

Now  we're  ready  to  start  the  proof.  If  there  is  a  decreasing  subsequence  of  length  n  +  1  we  are 
done.  If  there  is  no  such  subsequence,  f  :  £  ^  n.  By  the  generalized  Pigeonhole  Principle,  there  is 
sum  k  such  that  f{t)  =  k  for  at  least  i/n  values  of  t.  Thus  it  suffices  to  have  i/n  >  m.  In  other 
words  £  >  mn. 

2.3.11.  Let  the  elements  be  Si, . . . ,  s„,  let  to  =  0  and  let  =  Si  +  ...  +  Sj  for  1  <  i  <  n.  By  the 
Pigeonhole  Principle,  two  of  the  fs  have  the  same  remainder  on  division  by  n,  say  tj  and  tk  with 
j  <  k.  It  follows  that  tk  —  tj  =  sj+i  +  . . .  +  s/j  is  a  multiple  of  n. 
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Section  2.4 

2.4.1.  x{x  +  y)  =  XX  +  xy  =  X  +  xy  =  X. 

2.4.3.  We  state  the  laws  and  whether  they  are  true  or  false.  If  false  we  give  a  counterexample. 

(a)  X  +  {yz)  =  {x  +  y){x  +  z)  is  true.  (Proved  in  text.) 

(b)  x{y  ®  z)  =  {xy)  ®  {xz)  is  true. 

(c)  x+  {y  (B  z)  =  {x  +  y)  (B  {x  +  z)  is  false  with  x  =  y  =  z  =  1. 

(d)  X  ®  {yz)  =  {x  (B  y){x  (B  z)  is  false  with  x  =  y  =  1,  z  =  0. 

2.4.5.  We  use  algebraic  manipulation.  Each  step  involves  a  simple  formula,  which  we  will  not  bother 
to  mention.  You  could  also  write  down  the  truth  table,  read  off  a  disjunctive  normal  form  and  try 
to  reduce  the  number  of  terms. 

(a)  {x(By){x  +  y)  =  {xy' +  x'y){x  +  y)  =  xy' +  x'yx  +  xy'y  +  x'y  =  xy' +  x'y.  Note  that  this 
is  x®y. 

(b)  {x  +  y)®z  =  {x  +  y)z'  +  {x  +  y)'z  =  xz'  +  yz'  +  x'y'z. 

(c)  {x  +  y  +  z)  (B  z  =  {x  +  y  +  z)z' +  {x  +  y  +  z)' z  =  xz'  +  yz'  +  x'y' z' z  =  xz'  +  yz'. 

(d)  {xy)  (B  z  =  xyz'  +  {xy)' z  =  xyz'  +  x' z  +  y' z. 

I.A.I.  There  are  many  possible  answers.  A  complicated  one  comes  directly  from  the  truth  table 
and  contains  8  terms.  The  simplest  form  is  xw  +  yw  +  zw  +  xyz.  This  can  be  obtained  as  follows. 
{x-\-y+z)w  will  give  the  correct  answer  except  when  x  =  y  =  z  =  1  and  w  =  0.  Thus  we  could  simply 
add  the  term  xyzw'.  By  noting  that  it  is  okay  to  add  xyz  when  w  =  l,we  obtain  {x  +  y  +  z)'w  +  xyz. 

Section  3.1 

3.1.1.  From  the  figures  in  the  text,  we  see  that  they  are  123,  132  and  321. 

3.1.3.  We  will  not  draw  the  tree.  The  root  is  1,  the  vertices  on  the  next  level  are  21  and  12  (left  to 
right).  On  the  next  level,  321,  231,  213,  312,  132,  and  123.  Finally,  the  leaves  are  4321,  3421,  3241, 
3214,  4231,  2431,  2341,  2314,  4213,  2413,  2143,  2134,  and  so  on. 

(a)  7  and  16. 

(b)  2,4,3,1  and  3,1,2,4. 

3.1.5.  We  will  not  draw  the  tree.  There  are  nine  sequences:  ABABAB,  ABABBA,  ABBABA, 
ABBABB,  BABABA,  BABABB,  BABBAB,  BBABAB  and  BBABBA. 

3.1.7.  We  will  not  draw  the  tree. 

(a)  5  and  18. 

(b)  111  and  433. 

(c)  4,4,4  has  rank  19. 

(e)  The  decision  tree  for  the  strictly  decreasing  functions  is  interspersed.  To  find  it,  discard  the 
leftmost  branc-li  leading  out  of  each  vertex  except  the  root  and  then  discard  those  decisions 
that  no  longer  lead  to  a  leaf  of  the  original  tree. 
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3.1.9.  We  assume  that  you  are  looking  at  decision  trees  in  the  following  discussion. 

(a)  The  permutation  of  rank  0  is  the  leftmost  one  in  the  tree  and  so  each  element  is  inserted  as 
far  to  the  left  as  possible.  Thus  the  answer  is  n,  (n  —  1), . . . ,  2, 1. 

The  permutation  of  rank  n!  —  1  is  the  rightmost  one  in  the  tree  and  so  each  element  is 
inserted  as  far  to  the  right  as  possible.  Thus  the  answer  is  1, 2, 3, . . . ,  n. 

We  now  look  at  n!/2.  Note  that  the  decision  about  where  to  insert  2  splits  the  tree  into 
two  equal  pieces.  We  are  interested  in  the  leftmost  leaf  of  the  righthand  piece.  The  righthand 
piece  means  we  take  the  branch  1, 2.  To  stay  to  the  left  after  that,  3  through  n  are  inserted  in 
the  leftmost  position.  Thus  the  permutation  is  n,  (n  —  1), . . . ,  4, 3, 1, 2. 

(b)  The  permutation  of  rank  0  is  the  leftmost  one  in  the  tree  and  so  each  element  is  inserted  as 
far  to  the  left  as  possible.  It  begins  2,1.  Then  3  "bumps"  2  to  the  end:  3,1,2.  Next  4  "bumps" 
3  to  the  end:  4,1,2,3.  In  general,  we  have  n,  1, 2, 3, . . . ,  (n  —  1). 

The  permutation  of  rank  n!  —  1  is  the  rightmost  one  in  the  tree  and  so  each  element  is 
inserted  as  far  to  the  right  as  possible.  Thus  the  answer  is  1,2,3,...,  n. 

We  now  look  at  n!/2.  Note  that  the  decision  about  where  to  insert  2  splits  the  tree  into 
two  equal  pieces.  We  are  interested  in  the  leftmost  leaf  of  the  righthand  piece.  The  righthand 
piece  means  we  take  the  branch  1,  2.  To  stay  to  the  left  after  that,  3  through  n  are  inserted  in 
the  leftmost  position.  This  leads  to  "bumping"  as  it  did  for  rank  0.  Thus  the  permutation  is 


(c)  You  should  be  able  to  see  that  the  permutation  (1,  2,  3, . . . ,  n)  has  rank  0  in  both  cases  and 
that  the  permutation  (n, . . . ,  3,  2, 1)  has  rank  n!  —  1  in  both  cases. 

First  suppose  that  n  =  2m,  an  even  number.  It  is  easy  to  see  how  to  split  the  tree  in  half 
based  on  the  first  decision  as  we  did  for  insertion  order:  Choose  m,  +  1  and  then  stay  as  left  as 
possible.  This  means  everything  is  in  order  except  for  m  +  1.  Thus  the  permutation  is  m  +  1 
followed  by  the  elements  of  n  —  {m  +  1}  in  ascending  order. 

Now  suppose  that  n  =  2m  —  1.  In  this  case,  we  must  make  the  middle  choice,  m  and 
split  the  remaining  tree  in  half,  going  to  the  leftmost  leaf  of  the  right  part.  If  you  look  at  some 
trees,  you  should  see  that  this  leads  to  the  permutation  m,  m  +  1  followed  by  the  elements  of 
n  —  {m,  m  +  1}  in  ascending  order. 

3.1.11  (a)    We'll  make  a  decision  based  on  whether  or  not  the  pair  in  the  full  house  has  the  same 
face  value  as  a  pair  in  the  second  hand.  If  it  does  not,  there  are 


possible  second  hands.  Adding  these  up  and  multiplying  by  the  number  of  possible  full  houses 
(79,926)  gives  us  about  3  x  10*  hands. 

(b)  There  are  various  ways  to  do  this.  The  decision  trees  are  all  more  complicated  than  in  the 
previous  part. 

(c)  The  order  in  which  things  are  done  can  be  very  important. 

3.1.13.  You  can  simply  modify  the  decision  tree  in  Figure  3.5  as  follows:  Decrease  the  "number  of 
singles"  values  by  1  (since  the  desired  word  is  one  letter  shorter).  Throw  away  those  that  become 
negative;  i.e.,  erase  leaves  C  and  H.  Add  a  new  path  that  has  no  triples,  one  pair  and  five  singles. 


n,2,l,3,4,5,...,(n-l). 


possible  second  hands.  If  it  does,  there  are 
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Call  the  new  leaf  X.  It  is  then  necessary  to  recompute  the  numbers.  Here  are  the  results,  which  total 
to  113,540: 

-  (o)C)G)G,.m,m)-^'- 

-  G)G)G)G,J,.0-- 

-  G)G)G)G..^O-- 

F:  2,520 


IJ  \2J  \0J  \3,  2,  2 
G  :    I  ?  :  (     -     I  ^  560. 


2/  voy \1J  V3,  3,  1 


Section  3.2 

3.2.1.  Wc  use  the  rank  formula  in  the  text  and,  for  unranking,  a  greedy  algorithm. 

(-)    (3)  +  (2)  +  (?)  =  133.       («)  +  il)  +  Q  +  0  =  81. 

(b)    We  have  35  =  Q  so  the  first  answer  is  8,3,2,1.  The  second  answer  is  12,9,6,5  because 

(4)  <  400  < 

9\ 


400-  ( 

=  70 

70- 

=  14 

14- 

a) 

=  4 

il)  <  70  <  Q 
il)  <  14    <  il) 

(t)  <    4     <  il). 

(c)  9,6,4,2,1  and  9,7,2,1. 

(d)  9,5,4,3,2  and  9,6,5,3. 

3.2.3.  One  can  compute  the  ranks  by  looking  at  the  decision  tree  or  by  using  the  formula  in  Theo- 
rem 3.3.  We  choose  the  latter  approach.  In  case  (j),  we  have  f{i)  =  k  +  j  —  i.  (This  is  easily  checked 
since  this  /  clearly  decreases  by  1  as  i  increases  by  1  and  it  gives  /(I)  =  k,  k  +  1  and  fc  +  2  for  j  =  1, 
2  and  3,  respectively.)  By  the  theorem, 

^=l    ^  ^  1=1  ^ 

When  j  =  1,  all  the  binomial  coefficients  are  0  and  so  the  answer  for  the  first  function  is  0. 
When  j  =  2,  all  the  binomial  coefficients  are  1  and  so  the  answer  for  the  second  function  is  k. 
When  j  =  3,  we  have 

^k  +  2-i^  ^ 


i=l 


RANK(/)  =  EU  +  l-z)  =  E(^  +  2-^)  =  (fc  +  l)  +  (fc)  +  (fc-l)  +  ...  +  (2). 


Since  the  sum  of  the  first  n  positive  integers  is  !^i!^tii^  the  rank  is  _  i  —  Hi'+s) 


2     '  """"  ------  x„        2  ^2 
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13  4  choice  for  (2,2)  entry 

4  14  1  choice  for  (2,3)  entry 

3  13  choice  for  (2,4)  entry 

4  14  1  choice  for  (3,2)  entry 

1  2  2  1      2      2      4  choice  for  (3,3)  entry 

2  12  2  choice  for  (3,4)  entry 

Figure  S.3.1    The  decision  tree  for  4  x  4  standard  Latin  Squares  in  Exercise  3.3.1. 


3.2.5  (a)   £>!  X  (n  -  1)!  +  D2  X  (n  -  2)!  +  •  •  •  +          x  1!  =  YJlll  Dk{n  -  fc)!. 

(b)  Denote  the  permutation  by  /.  Let  L  =  n.  For  «  =  1,  2, . . . ,  n  —  1  in  order:  let  Di  is  the  number 
of  elements  in  L  which  are  less  than  f{i)  and  replace  L  with  L  —  {/(i)}. 

(c)  The  decision  sequences  are  4,4,0,1,1  and  5,1,2,0,0  and  so  the  ranks  are  579  and  636. 

(d)  By  a  greedy  algorithm  we  get  the  decision  sequences  1,1,1,0,1  and  2,2,2,0,0.  The  permutations 

are  2,3,4,1,6,5  and  3,4,5,1,2,6. 

3.2.9.  00000000000000000000  =  0^°;    11000000000000000000  =  l^O^^;    0100;  10101100. 


Section  3.3 

3.3.1.  When  building  and  n  x  n  Latin  Square,  if  the  first  n  —  1  rows  have  been  filled  in,  then  the 
last  row  is  determined.  Thus  we'll  omit  it  from  the  decision  tree.  The  tree  is  shown  in  Figure  S.3.1. 

3.3.3.  You  should  find  14  solutions. 


Section  4.1 

4.1.1.  The  Venn  diagrams  each  consist  of  two  intersecting  circles. 

(a)  V2  n  V3  contains  words  of  the  form  CVVC.  We  are  interested  in  V2  U  V3,  the  union  of  the 
circles.  Thus 

|^2uy3|  =  \V2\  +  \V3\-\V2nVs\ 

=  21^  X  5  X  26  +  21^  X  5  X  26  -  21^  x  5^ 
=  21^  X  5  X  47 

(b)  We  want  all  4  letter  words  beginning  and  ending  with  consonants  that  are  not  in  C2  fl  C3, 
which  is  21^  X  26^  -  21". 
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4.1.3  (a)    If  everyone  who  lost  an  eye  also  lost  an  arm,  a  leg  and  an  ear,  then  there  would  be  70 
people  who  lost  all  four. 

(b)   Let  A  be  the  set  of  people  who  lost  an  arm  and  L  the  set  who  lost  a  leg.  How  small  can  AdL 
be?  We  have 

\AnL\  =  \A\  +  \L\-\AUL\  =  165 -\AUL\  >  165-  100  =  65. 

We  can  now  look  at  the  set  D  =  AdL  of  double  amputees  and  ask  now  many  must  have  lost 
an  eye.  As  above,  we  have 

\DnI\  =       +  |/|  -  |£>  U /|  >  65  +  70-  100  =  35, 

where  /  is  the  set  of  people  who  have  lost  an  eye.  Finally,  wc  combine  these  people  with  the 
75  who  have  lost  an  car  to  conclude  that  at  least  35  +  75  —  100  =  10  must  have  lost  all  four. 
Thus  p  >  10.  Wc  can  achieve  this  by  insisting  that  everyone  lost  at  least  three  things.  If  the 
people  are  numbered  1-100,  we  can  do  it  as  follows: 

lost  arm:  1-80 

lost  leg:  1-65  and  81-100 

lost  eye:  1  35  and  66  100 

lost  ear:  1-10  and  36-100 


4.1.5  (a)  A  number  x  has  a  factor  in  common  with  N  if  and  only  if  it  is  divisible  by  one  of  the 
primes  that  divide  N.  Thus  an  element  of  N_  has  no  factor  in  common  with  TV  if  and  only  if  it 
is  in  none  of  the  sets  Sk- 

(b)  The  intersection  on  the  left  side  is  the  set  of  a;  e  iV  that  are  multiples  of  b  =  pi^-  -  ■  pi^ .  These 
are  b,  2b,  36  . . .,  {N/b)b.  Thus  the  set  has  N/b  elements,  as  was  to  be  proved. 

(c)  By  (4.3)  and  the  previous  result,  we  have 

ICn  iUeiP^  ICniel^P'^ 

Replacing  Xi  by  —l/pi  in  Example  1.14,  we  obtain  the  desired  result. 

4.1.7.  Let  Si  be  those  lists  in  which  Cj  is  adjacent  to  Cj.  Consider  a  list  in  fl  •  •  •  fl  Si^.  Using  the 
hint,  this  can  be  thought  of  as  a  list  made  from  2m  —  r  symbols,  where  for  the  present  we  regard  the 
two  occurrences  of  the  symbol  Cj  as  different  Since  the  list  is  a  rearrangement  of  the  symbols,  there 
are  (2m  —  r)!  such  lists.  However,  m  —  r  pairs  of  the  symbols  are  identical  and  we  have  treated  them 
as  different.  There  are  2'""''  ways  to  treat  such  symbols  as  different.  Thus  Nr  =  (™)(2m  — r)!/2'"~'". 

4.1.9.  The  proof  is  practically  the  same  as  that  given  for  Theorem  4.1.  Instead  of  asking  how  much 

s  E  S  contributes  to  the  sums,  ask  how  much  Pr(s)  contributes. 

4.1.11  (a)  The  products  are  1  or  0  according  as  s  belongs  to  precisely  the  sets  Si,  i  £  K  or  not. 
Thus  the  inner  sum  is  1  or  0  according  as  s  belongs  to  precisely  k  sets  or  not. 

(b)  Simply  expand  using  the  distributive  law  as  in  the  previous  exercise. 

(c)  The  first  part  is  just  a  rearrangement:  Instead  of  choosing  K  and  then  J,  first  choose  L 
(corresponding  to  J  U  K)  and  then  choose  K.  The  second  part  arises  because  there  are  ('^') 
ways  to  choose  K. 

(d)  Move  the  sum  over  s  &  S  inside  the  other  sums  and  collect  terms  according  to  \L\. 
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4.1.13  (a)  Let  the  notation  be  as  in  the  proof  of  the  Principle  of  Inclusion  and  Exclusion.  The  proof 

given  in  the  text  is  easily  adjusted  to  prove  s  contributes  exactly  Ct-i{X)  to  J^iZoi^^T ^i- 
Thus  the  sum  will  be  a  lower  bound  when  t  is  even  and  an  upper  bound  when  t  is  odd.  Including 
the  term  (— l)*^^  in  the  sum  changes  upper  bounds  to  lower  bounds  and  vice  versa  since  we 
are  now  considering  Ct{X).  By  considering  the  cases  of  t  even  and  t  odd  separately,  it  is  easy 

to  see  that  the  inequalities  follow. 

(b)    This  can  be  proved  by  induction  on  t  using  ('^')  =  ('^[~^)  +  (''^l^^)- 
4.1.15  (a)    Let  m  =  2.  Initially  the  N  array  contains 

2:     N2        1:     Ni       0  :  Nq. 

With  J  =  0,  we  do  i  =  1  and  then  i  =  0.  The  N  array  now  contains 

2:     N2       1:     N1-N2       0:     No-{Ni-  N2). 

With  j  =  l,we  obtain 

2  :     N2       1  :     {Ni-N2)-N2       0  :     Nq  -  {Ni  -  N2). 

Equation  (4.16)  gives 

E2  =  N2       El  =  N1-2N2       Eo  =  N0-N1  +  N2, 

which  agrees  with  the  values  computed  by  the  algorithm.  You  can  carry  out  similar  calculations 
for  m  =  3. 

(b)  This  can  be  done  by  carefully  carrying  out  the  steps  in  the  algorithm. 

(c)  After  no  iterations  (that  is.  at  the  start  of  the  algorithm),  iV^  contains  s  as  many  times  as  there 
is  set  of  r  indices  for  which  (4.17)  is  true.  If  s  appears  in  exactly  p  of  the  Si,  this  number  is  {^) . 
We  now  use  induction  on  t,  having  done  the  case  t  =  0.  After  t  —  1  iterations,  formula  (4.18) 
is  true  when  t  is  replaced  by  anything  smaller  in  it.  In  particular,  it  holds  with  t  replaced  by 
t-1. 

We  must  now  focus  on  the  inner  loop  of  the  algorithm.  What  does  it  do?  Since  never 
changes,  neither  does  N*-^.  Formula  (4.18)  gives  0  or  1  for  all  t  according  as  p  <  m  or  p  =  m 
(p  >  m  is  impossible).  This  is  the  correct  answer  for  both  A^^  and  Em- 

Back  to  the  action  of  the  inner  loop.  Again  we  can  prove  it  by  induction,  but  now  we  are 
going  from  N*-^  down  to  Nq  .  We  dealt  with  A'^  in  the  previous  paragraph.  If  the  inner  loop 
has  done  the  correct  thing  with  A^^+i)  then  the  number  of  times  s  appears  in  the  new  version 
of  N*  is  ii{p,  r,t—l)  —  iJ,{p,  r+l,t).  There  are  various  cases  to  consider.  We'll  just  look  at  one, 
namely  {^ij^zi})  -  ((^7^,).  Using  ("+^)  =  (^)  +  we  have 

\r-{t-l))     \{r+l)-t)        \r-t  +  l)     \r-t+l)  \r-t)' 

which  is  what  we  needed  to  prove.  We  leave  the  other  cases  in  (4.18)  to  you.  The  last  sentence 
in  the  exercise  follows  from  the  fact  that  all  the  numbers  we  calculate  are  nonnegative.  (This 
takes  care  of  the  problem  of  how  we  should  interpret  the  multiset  difference  A  —  B  ii  s  appears 
more  often  in  B  than  it  does  in  A.) 

When  t  >  m,  the  only  time  the  binomial  coefficient  is  used  in  (4.18)  is  when  t  =  p  =  m 
and  it  then  has  the  value  (^^^),  which  is  zero  unless  r  =  m,  when  it  is  1.  Thus,  for  t  >  m, 
li{p,  r,  t)  equals  lifr  —  p  and  0  otherwise.  Hence  N*  is  a  set  containing  precisely  those  elements 
that  are  in  exactly  r  of  the  Sj  . 

(d)  This  is  implicit  in  the  proof  for  (c) . 
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4.1.17.  Let  Si  be  the  subset  of  n  divisible  by  Oj.  Then,  Nt  is  the  sum  over  all  t-subsets  T  of  ^  of 

[n/lcm(r)J,  where  lcm(r)  is  the  least  common  multiple  of  the  elements  of  T  and  the  floor  [xj  is 
the  largest  integer  not  exceeding  x.  For  the  various  parts  of  the  exercise  you  need  the  following. 

•  If  all  elements  of  A  divide  n,  then  [n/lcm(r)J  =  n/lcm(T). 

•  If  no  two  elements  of  A  have  a  common  factor,  then  lcm(T)  =  HigT  *• 

(a)  Using  the  previous  comments  in  the  special  case  fc  =  0,  we  obtain  after  some  algebra 

E(-im  =  nn(i4)> 

i=0  aeA  ^  ' 

which  is  the  Euler  phi  function  when  A  is  the  set  of  prime  divisors  of  n. 

(b)  The  comment  for  (a)  applies  in  this  case  as  well. 

(c)  There  is  no  simple  formula  even  when  A;  =  0  because  the  floor  function  cannot  be  eliminated. 

(d)  Now  we  cannot  even  eliminate  the  1cm  function. 

4.1.19.  In  all  cases,  what  we  must  do  is  prove  that  (P-1),  (P-2)  and  (P-3)  hold.  We  omit  most  of 

them. 

(d)  Since  xjx  =  1,  (P-1)  is  true.  Suppose  that  xpy  and  ypx.  Then  x/y  and  y/x  are  both  integers. 
Since  {x/y){y/x)  =  1,  the  only  possible  integer  values  for  x/y  and  y/x  are  ±1.  Since  x  and 
y  are  positive,  it  follows  that  x/y  =  1  and  so  (P-2)  is  true.  Suppose  that  x/y  and  y/z  are 
integers.  Then  so  is  x/z  and  so  (P-3)  is  true. 

4.1.21.  Since  every  set  is  the  union  of  itself,  xpx.  Suppose  xpy  and  ypx.  Let  by  be  a  block  of  y. 
Since  xpy,      C  hy  for  some  block     of  x.  Since  ypx,  by  C  b^  for  some  block  by  of  y.  Since  blocks  of 

a  partition  are  either  equal  or  disjoint  and  since  by  C  bx  ^  by,  we  have  by  —  by  and  so  b^  =  by.  This 
proves  that  every  block  of  y  is  a  block  of  x.  Hence  x  =  y  and  so  (P-2)  is  true.  It  is  easy  to  prove 
(P-3). 

4.1.23  (a)    With  each  element  s  G  S,  associate  a  set  ,9(5)  such  that  s  G  Si  if  and  only  if  i  G  g{s). 
Then  Ek  counts  those  s  <E  S'  for  which  \g{s)\  =  k.  Since  the  number  oi  s  €  S  with  g{s)  =  y  is 

e{y),  the  sum  of  e(y)  over  \y\  —  k  also  counts  those  s. 

(b)  An  element  s  is  counted  in  (4.14)  if  and  only  if  it  belongs  to  all  Si  for  which  i  G  x.  This  is  the 
same  as  the  deflnition  of  the  set  intersection. 

(c)  The  sum  of  e{x)  over  all  x  of  since  k  is  Ek.  Putting  this  together  with  (4.15),  we  have 

=  E  E(-i)""-'/(^)  =  E  E  =  E  (fc)(-i)'^'-V(y). 

\x\=ky^x  \y\>k  xCy  \v\>k 

\x\=k 

The  sum  of  f{y)  in  (b)  over  all  y  of  size  t  is  Nf.  Collecting  terms  according  to  \y\,  we  have 

=  tQ(-^r'N,  =  eCI 

where  we  set  t  =  i  +  k.  Now  use  ('■^'')  =  C"!*) . 
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Section  4.2 

4.2.1.  The  number  of  6-long  sequences  made  with  B,  R  and  W  is  3^'  =  729,  which  is  much  too 
long.  The  number  of  6-long  sequences  in  which  adjacent  beads  differ  in  color  is  3  x  2^  —  96,  which 
is  more  manag(^abl{\  but  still  riuite  long.  We  won't  list  them.  We  could  "cheat"  by  being  a  bit  less 
mechanical:  If  the  necklace  contains  a  B,  we  could  start  with  it.  There  are  2^  =  32  such  necklaces, 
a  manageable  number.  The  only  necklace  without  B  must  alternate  R  and  W,  so  there  is  only  one 
of  them.  Here  are  the  32  other  necklaces,  where  a  number  preceding  a  necklace  is  the  first  place  it 
appears  in  the  list  when  considered  circularly  or  flipped  over.  A  zero  means  it  was  rejected  because 
the  first  and  last  beads  are  the  same. 


1:  BRBRBR 

2: 

BRBRBW 

0:  BRBRWB 

3:  BRBRWR 

2:  BRBWBR 

4: 

BRBWBW 

0:  BRBWRB 

5: 

BRBWRW 

0:  BRWBRB 

6:  BRWBRW 

0:  BRWBWB 

7: 

BRWBWR 

3:  BRWRBR 

8: 

BRWRBW 

0:  BRWRWB 

9:  BRWRWR 

2:  BWBRBR 

4: 

BWBRBW 

0:  BWBRWB 

8: 

BWBRWR 

4:  BWBWBR 

10:  BWBWBW 

0:  BWBWRB 

11: 

BWBWRW 

0:  BWRBRB 

7: 

BWRBRW 

0:  BWRBWB 

7:  BWRBWR 

5:  BWRWBR 

11: 

BWRWBW 

0:  BWRWRB 

12: 

BWRWRW 

4.2.3  (a)    Since  4  beads  are  used,  at  most  4  different  kinds  of  beads  are  used.  We  can  construct 

an  arrangement  of  beads  by  choosing  the  number  of  types  that  must  appear  (1,  2,  3  OR  4), 
choosing  that  many  types  of  beads  from  the  r  types  AND  then  choosing  an  arrangement  using 
all  of  the  types  of  beads  that  we  chose. 

(b)    Trivially,  /(I)  =  1.  For  /(2),  our  decision  will  be  the  number  of  beads  of  the  first  type  that 

appear.  After  that,  it  is  easy.  This  gives  us  1  +  2  +  1  =  4.  For  /(3),  our  decision  will  be  which 
bead  appears  twice.  This  gives  us  3  x  2  =  6  For  /(4),  each  bead  appears  once  and  there  are 
3  possibilities.  Thus 

which  can  be  rewritten  as  r(r  +  l)(r^  +  r"^)/8,  if  desired. 

4.2.5.  The  problem  can  be  solved  by  either  decision  tree  method.  It  is  useful  to  note  that  all 
solutions  must  begin  with  h  because  any  board  that  starts  with  v  can  be  flipped  about  a  NW-SE 
(135°)  diagonal  to  give  one  that  starts  with  h.  Also  note  that  a  lexically  least  sequence  that  starts 
with  hv  determines  the  entire  sequence.  (To  see  this,  note  that  it  starts  hvv  and  look  at  rotations 
of  the  board.) 

We  will  use  the  second  method.  Our  first  decision  will  be  the  number  of  entire  rows  and/or 
columns  that  are  covered  by  two  whole  dominoes.  For  example,  two  dominoes  in  the  top  row  or 
two  dominoes  in  the  third  column.  Note  that  we  cannot  simultaneously  cover  a  row  and  a  column 
because  they  overlap.  Let  the  number  be  L.  The  possible  values  of  L  are  0,  1,  2  and  4.  (You  should 
find  it  easy  to  see  why  L  =  3  is  impossible.)  Note  that  we  c;an  always  use  the  symmetries  to  make 
the  first  domino  horizontal.  For  L  =  4,  there  is  obviously  only  one  solution  and  its  lex  minimal  form 
is  hhhhhhhh.  For  L  =  0,  we  use  Method  1  to  obtain  hvvhvvhh  as  the  the  only  solution.  (Beware: 
reading  the  sequence  in  reverse  does  not  correspond  to  a  symmetry  of  the  board.)  For  L  =  1,  we  note 
that  the  entire  row  or  column  must  be  at  the  edge  of  the  board.  Suppose  it  is  the  first  row.  Refer 
back  to  Figure  3.15  to  see  that  the  only  way  to  complete  the  board  without  increasing  L  is  hvvvvh. 
This  is  already  lex  minimal:  hhhvvvvh.  Suppose  L  =  2.  By  rotation,  we  can  assume  we  have  two 
full  rows  and,  because  they  cannot  be  in  the  middle,  one  of  them  is  the  first  row.  Again,  refer  to 
Figure  3.15  to  find  how  many  ways  we  can  complete  the  board  with  one  more  horizontal  row.  This 
leads  to  six  solutions:  hhhhhvvh,  hhhvvhhh,  hhhhvhvh,  hhvhvhhh,  hhhhvvvv  and  hhvvvvhh.  This 
gives  a  total  of  nine  solutions. 
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4.2.7.  When  we  write  out  our  answers,  they  will  be  in  the  form  suggested  in  the  problem,  without 

the  surrounding  boxes.  To  obtain  the  lex  least  solutions,  we  must  linearly  order  the  faces.  Our  order 
will  be  the  line  of  four  side  faces  from  left  to  right,  then  the  top  and,  finally,  the  bottom.  We  use  B, 
R  and  W  to  denote  the  colors,  and  b,  r  and  w  to  denote  the  number  of  faces  of  each  color. 

(a)  Our  first  decision  will  be  the  number  of  black  faces.  By  interchanging  black  and  white,  a 
solution  with  b  black  faces  can  be  converted  to  one  with  6  —  6,  so  we  only  need  look  at  6  =  0  1, 

2  and  3.  For  6  =  0  and  6=1,  there  are  obviously  only  one  solution.  For  6  =  2,  we  must  decide 
whether  to  put  the  second  black  face  adjacent  or  opposite  the  first  one.  Here  are  the  4  solutions 
for  6  <  3. 

w         w         w  w 

WWWW    BWWW    BBWW  BWBW 
W  W  W  W 

For  6  =  3,  our  second  decision  is  whether  or  not  all  three  black  faces  share  a  common  vertex. 
This  leads  to  just  2  solutions: 

B  W 
BBWW  BBBW 
W  W 

Doubling  the  answers  for  6  <  3  to  get  those  for  6  >  3  gives  us  10  solutions. 

(b)  In  the  previous  solution,  we  can  limit  ourselves  to  6  <  3.  When  6  =  3,  we  need  to  check  whether 
or  not  one  solution  is  converted  to  the  other  when  black  and  white  are  interchanged.  They  are 

not,  so  6  =  3  still  gives  2  solutions  for  a  total  of  6. 

(c)  The  mirror  image  of  each  of  the  10  solutions  is  equivalent  to  itself,  so  there  are  still  10  solutions. 

(d)  Our  first  decision  will  be  the  list  6,  r,  w.  By  interchanging  colors,  we  need  only  consider  the 
situations  where  b  <  r  <  w.  This  gives  us  1,1,4,  1,2,3  and  2,2,2.  Interchanging  colors  in  all 

possible  ways  gives  rise  to  3,  6  and  1  solutions,  respectively,  for  each  solution  found.  For  1,1,4, 
our  decision  will  be  whether  B  and  R  are  on  adjacent  or  opposite  faces.  Each  leads  to  one 
coloring.  For  1,2,3  our  first  decision  will  be  the  number  of  R's  that  are  adjacent  to  the  B. 
One  adjacency  gives  1  solution  and  two  give  2  solutions,  depending  on  whether  the  R's  are 
adjacent  or  opposite  each  other.  For  2,2,2,  our  first  decision  will  be  whether  or  not  the  B's 
are  adjacent  or  opposite.  Our  second  decision  will  be  whether  or  not  the  R's  are  adjacent  or 
opposite.  Each  choice  leads  to  1  solution  except  when  the  B's  are  adjacent  and  the  R's  are 
adjacent.  In  this  case  there  are  more  solutions.  One  possibility  is  to  have  the  4  sides  be  BBRR. 
Another  possibility  is  to  have  the  4  sides  be  BBRW  and  then  place  the  additional  R  on  either 
the  top  or  the  bottom.  These  last  two  possibilities  are  mirror  images  of  each  other,  but  we 
cannot  transform  one  to  the  other  with  just  rotations.  The  solutions  are  given  in  Figure  S.4.1. 
This  gives  us  2x3  +  3x6  +  6  =  30  solutions. 

(e)  If  all  3  colors  appear,  there  are  30  solutions.  If  only  1  color  appears,  there  are  obviously 

3  solutions.  What  if  exactly  2  colors  appear,  we  can  first  choose  the  2  colors  AND  then  use 
them.  By  the  first  part  of  this  exercise,  there  are  10  —  2  =  8  ways  to  use  the  colors  so  that  both 
appear.  Thus  we  have  30  +  3  +  (2)8  =  57  solutions. 

(f)  Note  that  no  color  can  appear  more  than  3  times  on  any  given  cube.  Also  note  that  at  most 
6  colors  appear  on  any  given  cube.  By  looking  over  our  previous  work,  we  find,  in  the  notation 
of  Exercise  4.2.4,  that  /(O)  =  /(I)  =  0,  /(2)  =  1  and  /(3)  =  8.  By  looking  at  decision  trees  for 
the  color  counts  1,1,1,3  and  1,1,2,2,  we  find  that  /(4)  =  Q2+  =  32.  Consider  /(5)  which 
has  just  the  one  color  count  list  1,1,1,1,2.  There  is  one  way  to  place  the  repeated  colors.  The 
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Figure  S.4.1    The  distinct  painted  cubes  with  various  numbers  of  faces  painted  Black,  Red  and  White. 


partially  colored  cube  can  be  transformed  into  itself  be  leaving  it  fixed  or  by  rotating  it  so  that 
the  two  colored  faces  are  interchanged.  This  means  that  whenever  we  color  the  remaining  4  faces 
with  4  distinct  colors,  there  will  be  exactly  one  other  coloring  that  is  equivalent  to  it.  Thus 
/(5)  =  (^)(4!/2)  =  60.  If  you  experiment  a  bit,  you  will  discover  that  there  are  24  symmetries 
of  the  cube.  If  all  the  faces  are  colored  differently,  each  of  the  symmetries  leads  to  an  equivalent 
coloring  that  looks  different.  Thus  /(6)  =  6!/24  =  30.  Putting  all  this  together,  we  have 


Section  4.3 

4.3.1.  The  image  of  F  is  all  k  element  subsets  of  n.  F~^{x)  consists  of  all  possible  ways  to  arrange  the 
elements  of  a;  in  a  list.  Since  we  are  able  to  count  lists,  we  know  that  there  are  kl  such  arrangements. 
We  also  know  that  \A\  =  n\/{n  —  k)\.  Thus  the  coimage  of  F  consists  of  C{n,  k)  blocks  all  of  size  k\ 
and  the  union  of  these  blocks  has  n!/(n  —  fc)!  elements.  Thus  C(n,  k)  = 

4.3.3.  Note  that  Ni^j)  =  0  unless  7  €  Pg  or  7  €  P5.  In  the  former  case,  N{-f)  =  (l)  =  56  and  in  the 
latter  case,  A''(7)  =  Q)  (J)  =  6.  Thus  there  are  (56  +  4  x  6)/16  =  5  necklaces. 

4.3.5  (a)  The  second  line  consists  of  the  first  line  circularly  shifted  by  c,  an  integer  between  0  and 
n  —  1;  i.e.,  the  second  line  is  si,  S2,  •  •  • ,  Sn,  where  St  =  c  +  t  if  this  is  at  most  n  and  c  +  t  —  n, 
otherwise. 

(b)  In  addition  to  the  elements  of  the  cyclic  group,  we  have  permutations  whose  second  lines  are 
cyclic  shifts  of  n, . . . ,  2, 1. 

(c)  There  are  0,  1  or  2  cycles  of  length  1  and  the  remaining  cycles  are  all  of  length  2.  If  n  is 
odd,  there  is  always  exactly  one  cycle  of  length  1.  If  n  is  even,  there  is  never  exactly  one  cycle 
of  length  1.  You  can  write  down  the  cycles  as  follows.  All  numbers  that  are  mentioned  are 
understood  to  have  an  appropriate  multiple  of  n  added  to  (or  subtracted  from)  them  so  that 
they  lie  between  1  and  n  inclusive.  If  n  is  odd,  choose  a  cycle  {k).  The  remaining  cycles  are 
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{k  —  t,k  + 1)  where  1  <t  <  n/2.  If  n  is  even,  choose  k  <  n/2.  There  are  two  ways  to  proceed. 

First,  wc  could  have  all  cycles  of  the  form  {k  —  t  +  \,k  +  t)  where  1  <  t  <  n/2.  Second,  we 
could  have  (A:),  (fc  +  n/2)  and  all  cycles  of  the  form  {k  —  t,  k  +  t)  where  1  <  t  <  n/2. 

4.3.7.  The  proof  in  the  text  shows  that  the  right  side  of  the  given  equahty  is  \G\  J2geG  -^id)- 
(4.20),  the  left  side  is 

yes  yes  l^^l 

The  rest  of  the  proof  follows  easily  by  adapting  what  was  done  in  the  text.  This  seems  to  be  a 
shorter  proof  than  the  one  in  the  text.  Why  didn't  we  use  it?  First,  it's  not  particularly  shorter; 

however,  it  is  a  bit  cleaner.  Unfortunately,  it  requires  starting  with  the  completely  unmotivated 
double  summation  in  which  we  have  interchanged  the  order  of  the  sums. 


Section  5.1 

5.1.1.  The  sum  is  the  number  of  ends  of  edges  since,  if  x  and  y  are  the  ends  of  an  edge,  the  edge 
contributes  1  to  the  value  of  d{x)  and  1  to  the  value  of  d{y).  Since  each  edge  has  two  ends,  the  sum 
is  twice  the  number  of  edges. 

5.1.3.  The  graph  with 

(abcdefghijk\ 
ccfaheeadaa\. 
cgghhhfhgdfJ 

is  isomorphic  to  Q.  The  correspondence  between  vertices  is  given  by 

( a  b  c  d  e  f  g  h\ 
\hacefdgb) 

where  the  top  row  corresponds  to  the  vertices  of  Q.  The  graph  with 

/I     23456     789    10    11  \ 
E  =  {1,2,3,4,5,6,7,8,9,10,11}    and    ip=[AEEEFGHBCD  e\. 

\ghefghbcddhJ 

is  not  isomorphic  to  Q.  One  edge  needs  to  be  deleted  from  P'{Q)  and  one  added. 

5.1.5  (a)  There  is  no  graph  Q  with  degree  sequence  (1, 1, 2, 3, 3, 5)  since  the  sum  of  the  degrees  is 
odd. 

(b)  There  are  such  a  graph.  You  should  draw  an  example. 

(c)  Up  to  labehng,  the  graph  is  unique.  Take  V  =  {1, . . . ,  6}  and 

E  =  {{1,6},  {2,6},  {2,4},  {3,6},  {3,5},  {4,6},  {4,5},  {5,6}} 

(d)  A  graph  with  degree  sequence  (3,3,3,3)  has  (3 +  3  +  3  + 3) /2  =  6  edges  and,  of  course  4 
vertices.  That  is  the  maximum  (2)  of  edges  that  a  graph  with  4  vertices  can  have.  It  is  easy  to 
construct  such  a  graph.  This  graph  is  called  the  complete  graph  on  4  vertices. 

(f)  There  is  no  simple  graph  (or  graph  without  loops  or  parallel  edges)  with  degree  sequence 

(3,3,3,5). 

(g)  Similar  arguments  to  the  (3,3,3,3)  case  apply  to  the  complete  graph  with  degree  sequence 
(4,4,4,4,4). 
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all 


injections 


surjcctions 


L  L 

L  U 

U  L 

U  U 


6" 

Efe<b'S'(a,fe) 
a  +  b-1^ 
a 

Efe<bP(a.fc) 


b{b-l)---{b-a+l)  b\S{a,b) 

1  S{a,  b) 

'b\  fa-l 

J  ya  ~  b 

1  p{a,b) 


Figure  S.5.1    Some  basic  enumeration  problems. 


Section  5.2 

5.2.1.  Let     and  e  be  the  bijections. 

(a)  This  follows  from  the  fact  that  v  and  e  are  bijections. 

(b)  This  can  be  seen  intuitively  from  the  drawing  of  the  unlabeled  graph.  If  you  want  a  more  formal 
proof,  first  note  that  the  degree  of  a  vertex  v  is  the  number  of  edges  e  such  that  v  G  v(e).  Now 
use  the  fact  that  v  £  <p(e)  is  equivalent  to  i'{v)  £  ip'{e{e)). 

5.2.3  (a)    This  is  exactly  like  the  next  problem  with  the  transpose,  *,  replaced  by  inverse, 

everywhere. 

(b)  Let  /  be  the  n  x  n  identity  matrix.  Since  A  =  lAI*,  A  ~  A.  Suppose  that  A  ~  5.  Then 
B  =  PAP*  for  some  nonsingular  P.  Multiplying  on  the  left  by  P~^  and  on  the  right  by 
(p-i)t  =  (p*)-i,  we  have 

{p-'^)B{p-y  =  {p-'^P)A{P\p-y)  =  (P-^P)A(P*(P*)-1)  =  A 

Thus  B  ~  A.  Suppose  that  A  ~  B  ~  C.  Then  we  have  nonsingular  P  and  Q  such  that 
B  =  PAP*  and  C  =  QBQK  Thus  C  =  Q{PAP*)Q*  =  {QP)A{P*Q*)  =  {QP)A{QPy.  This 

proves  transitivity. 

5.2.5.  Let  E  G  V2{V)  and  E'  e  V2iV').  Write  G  =  {V,  E)  ~  {V ,  E')  =  G'  if  and  only  if  there  is  a 
bijection  u:V  ^  V  such  that  {u,  v}  £  E  if  and  only  if  {i'{u),i'{v)}  G  V. 

We  could  show  that  this  is  an  equivalence  relation  by  adapting  the  proof  in  Example  5.5. 
An  alternative  is  to  show  how  this  definition  leads  to  the  equivalence  relation  for  G  and  G'  in- 
terpreted as  graphs.  We'll  take  this  approach.  In  this  case  (p  and  ip'  are  identity  maps.  Define 
e{{u,v})  —  By  our  definition  in  the  previous  paragraph,  e:  E  ^  E'  is  a  bijection. 

Since  if  and  if'  are  the  identity,  the  requirement  that  (p'(e(e))  =  i/{if{e))  in  the  definition  of  graph 
isomorphism  is  satisfied. 

5.2.7.  The  table  is  shown  in  Figure  S.5.1.  The  entries  which  are  1  follow  when  you  realize  what  is 

being  counted.  The  LL  row  corresponds  to  ordered  samples  and  the  UL  row  to  unordered  samples, 
which  have  been  considered  in  Chapter  1.  The  UL-surjection  entry  comes  from  the  realization  that 
our  sample  allows  repetition  but  must  include  every  element  in  h  so  that  we  are  only  free  to  choose 
a  —  b  additional  elements.  In  the  LU  row,  the  fact  that  the  range  is  unlabeled  means  that  we  can 
only  distinguish  functions  that  have  different  coimages.  The  UU  row  is  associated  with  partitions  of 
numbers.  We  use  p{n,  k)  to  denote  the  number  of  partitions  of  n  having  exactly  k  parts. 
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Section  5.3 

5.3.1.  Since  E  C  V2{V),  we  have  a  simple  graph.  Regardless  of  whether  you  are  in  set  C  or  S, 
following  an  edge  takes  you  into  the  other  set.  Thus,  following  a  path  with  an  odd  number  of  edges 
takes  you  to  the  opposite  set  from  where  you  started  while  a  path  with  an  even  number  of  edges 
takes  you  back  to  your  starting  set.  Since  a  cycle  returns  to  its  starting  vertex,  it  obviously  returns 
to  its  starting  set. 

5.3.3  (a)   Let  e  =  {u,  v}  and  let  /  =  {v,  w;}  be  the  other  edge.  Since  G  is  simple,       w.  Since  e  is 

a  cut  edge,  u  and  v  arc  in  separate  components  of  (V,  E  —  {e}).  Thus  so  are  u  and  w.  Since  the 
graph  induced  by  ^  —  {v}  is  a  subgraph  of  (V^,  E  —  {e}),  u  and  w  are  in  separate  components 
of  it  as  well. 


(b)  Take  two  triangles  and  identify  their  tops.  The  merged  top  is  a  cut  vertex  but  the  graph  has 
no  isthmus. 

(c)  We  will  prove  that  e  is  a  cut  edge  if  and  only  if  its  ends  u  and  v,  say,  lie  in  different  components 

of  G'  =  {V,E  —  {e}).  The  result  will  then  follow  because,  first,  if  C  is  a  cycle  containing  e, 
removal  of  e  does  not  leave  its  ends  in  different  components,  and,  second,  if  u  and  v  are  in  the 
same  components  of  G' ,  then  there  is  a  path  P  connecting  them  in  G'  and  P  and  e  form  a 
cycle  in  G. 

Now  back  to  the  original  claim.  If  u  and  v  are  in  different  components  of  G",  then  e  is  a 
cut  edge.  Suppose  e  is  a  cut  edge  of  G.  Since  G  is  connected  and  every  path  in  G  that  is  not 
a  path  in  G"  contains  e,  it  follows  that  if  x  and  y  are  in  different  components  of  G"  any  path 
connecting  them  in  G  contains  e.  Let  P  be  such  a  path  and  let  u  be  the  end  of  e  first  reached 
on  P  when  starting  from  x.  It  follows  that  x  and  u  are  in  one  component  of  G'  and  that  y  and 
V  (the  other  end  of  e)  are  one  component,  too.  Since  x  and  y  are  in  different  components,  so 
are  u  and  v. 

(d)  We  claim  that  v  €  V  is  &  cut  vertex  of  G  if  and  only  if  there  are  two  edges  e  and  e'  both 
containing  v  such  that  no  cycle  of  G  contains  both  e  and  e'. 

Proof.  Suppose  that  w  is  a  cut  vertex.  Let  x  and  y  belong  to  different  components  of  the 
graph  G"  induced  hy  V  —  {w}.  Any  path  from  x  to  y  in  G  must  include  v.  Let  P  be  such  a 
path  and  let  e  and  e'  be  the  two  edges  in  P  that  contain  v.  If  e  and  e'  were  on  a  cycle  C  in 
G,  then  we  could  remove  e  and  e'  from  P  and  add  on  C  —  {e,  e'}  to  obtain  a  route  from  x 
to  y  that  does  not  go  through  v.  Since  this  contradicts  the  fact  that  x  and  y  axe  in  different 
components  of  G",  it  follows  that  e  and  e'  do  not  lie  in  a  cycle. 

The  steps  can  be  reversed  to  prove  that  if  e  and  e'  are  edges  incident  with  v  that  do  not 
lie  on  a  cycle,  then  v  is  a  cut  vertex:  Let  x  and  y  be  the  other  vertices  on  e  and  e'.  Sincx^  e.  and 
e'  do  not  lie  on  a  cycle,  every  path  from  x  io  y  must  include  either  e  or  e'  (or  both),  and  hence 
includes  v.  Since  there  is  no  path  from  xioy  not  including  v,  they  are  in  different  components 
of  G". 

5.3.5  (a)   The  graph  is  not  Eulerian.  The  longest  trail  has  5  edges,  the  longest  circuit  has  4  edges. 

(b)  The  longest  trail  has  9  edges,  the  longest  circuit  has  8  edges. 

(c)  The  longest  trail  has  13  edges  (an  Eulerian  trail  starting  at  C  and  ending  at  D).  The  longest 
circuit  has  12  edges. 

(d)  This  graph  has  an  Eulerian  circuit  (12  edges). 
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Section  5.4 

5.4.1.  We  first  prove  that  (b)  and  (c)  are  equivalent.  We  do  this  by  showing  that  the  negation  of 

(b)  and  the  negation  of  (c)  are  equivalent.  Suppose  u  ^  v  are  on  a  cycle  of  G.  By  Theorem  5.3, 
there  are  two  paths  from  u  to  v.  Conversely,  suppose  there  are  two  paths  from  u  to  v.  Call  them 
u  =  xo,x\, . . .  ,Xk  =  V  and  u  =  yo,yi, . . .  ,ym  =  v.  Let  i  be  the  smallest  index  such  that  Xi  ^  j/j.  We 
may  assume  that  i  =  1  for,  if  not,  redefine  u  =  On  the  new  paths,  let  Xa  =  Vb  be  the  smallest 

a  >  0  for  which  some  Xj  is  on  the  y  path.  The  walk 

U  =  Xo,  Xi,...,Xa  =  yb,  2/6-1,  ...  ,2/0  =  w 

has  no  repeated  vertices  except  the  first  and  last  and  so  is  a  cycle.  (A  picture  may  help  you  visualize 
what  is  going  on.  Draw  the  x  path  intersecting  the  y  path  several  times.) 

We  now  prove  that  (d)  implies  (b).  Suppose  that  G  has  a  cycle,  vo,Vi, . . .  ,Vk,Vo.  Remove  the 
edge  {vq,  Vk}-  In  any  walk  that  uses  that  edge,  replace  it  with  the  path  vo,vi, . . .  ,Vk  or  its  reverse, 
as  appropriate.  Thus  the  graph  is  still  connected  and  so  the  edge  {vo,Vk}  contradicts  (d). 

5.4.3  (a)   By  Exercise  5.1.1,  we  have  T,vev  d{v)  =  '2\E\.  By  5.4(e),  \E\  =  \V\  -  1.  Since 

2\V\  =  JZ^'    we  have    2  =  2\V\-2\E\  =  ^(2-d(i;)). 
vev  vev 

(b)  We  give  three  solutions.  The  first  uses  the  previous  result.  The  second  uses  the  fact  that  each 
tree  except  the  single  vertex  has  at  least  two  leaves.  The  third  uses  the  fact  that  trees  have  no 
cycles. 

Suppose  that  T  is  more  than  just  a  single  vertex.  Since  T  is  connected,  d{v)  ^  0  for  all  v. 
Let  rife  be  the  number  of  vertices  of  T  of  degree  k.  By  the  previous  result,  '^k>ii^  ~  k)nk  =  2. 
Rearranging  gives  rii  =  2  +  X!fe>2(^  ^  2)nfe.  If  Um  >  1,  the  sum  is  at  least  m  —  2. 

For  the  second  solution,  remove  the  vertex  of  degree  m  to  obtain  m  separate  trees.  Each 
tree  is  either  a  single  vertex,  which  is  a  leaf  of  the  original  tree,  or  has  at  least  two  leaves,  one 
of  which  must  be  a  leaf  of  the  original  tree. 

For  the  third  solution,  let  v  be  the  vertex  of  degree  m  and  let  {v,Xi}  be  the  edges 
containing  v.  Each  path  starting  v,  Xi  must  eventually  reach  a  leaf  since  there  are  no  cycles. 
Call  the  leaf  yi.  These  leaves  are  distinct  since,  if  yt  =  yj,  the  walk  v,Xi, . . .  ,yi  =  yj, . . .  ,Xj,v 
would  lead  to  a  cycle. 

(c)  Let  the  vertices  be  u  and     for  1  <  i  <  m.  Let  the  edges  be  {u,  Vi}  for  1  <  i  <  m. 

(d)  Let  N  =  ri3  +  714  +  •  •  •,  the  number  of  vertices  of  degree  3  or  greater.  Note  that  fc  —  2  >  1 
for  fc  >  3.  By  our  earlier  formula,  ni  >  2  +  TV.  If  n2  =  0,  TV  =  |y|  —  ni  and  so  we  have 
ni>2+\V\-  ni.  Thus  rii  >  1  +  \V\/2.  Similarly,  if  n2  =  1,  iV  =  \V\  -  m  -  1  and,  with  a  bit 

of  algebra,  ni  >  (1  +  \V\)/2. 

(e)  A  careful  analysis  of  the  previous  argument  shows  that  the  number  of  leaves  will  be  closest  to 
\V\/2  if  we  avoid  vertices  with  high  degrees.  Thus  we  will  try  to  make  our  vertices  of  degree 
three  or  less.  We  will  construct  some  RP-trees,  with  k  leaves.  Let  Ti  th(^  isolated  vertex.  For 
fc  >  1,  let  Tk  have  two  children,  one  a  single  vertex  and  the  other  the  root  of  T^-i-  Clearly 
has  one  more  leaf  and  one  more  nonleaf  than  Tk-i-  Thus  the  difference  between  the  number 
of  leaves  and  nonleaves  is  the  same  for  all  T^.  For  Ti  it  is  one. 

5.4.5.  Since  the  tree  has  at  least  3  vertices,  it  has  at  least  3  —  1  =  2  edges.  Let  e  =  {u,v}  be  an 
edge.  Since  there  is  another  edge  and  a  tree  is  connected,  at  least  one  of  u  and  v  must  lie  on  another 
edge  besides  e.  Suppose  that  u  does.  It  is  fairly  easy  to  see  that  u  is  a  cut  vertex  and  that  e  is  a  cut 
edge. 
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5.4.7  (a)   The  idea  is  that  for  a  rooted  planar  tree  of  height  h,  having  at  most  2  children  for  each 

non-leaf,  the  tree  with  the  most  leaves  occurs  when  each  non-leaf  vertex  has  exactly  2  children. 
You  should  sketch  some  cases  and  make  sure  you  understand  this  point.  For  this  case  1  =  2'^ 
and  so  log2(Z)  =  h.  Any  other  rooted  planar  tree  of  height  h,  having  most  2  children  for  each 
non-leaf,  is  a  subtree  (with  the  same  root)  of  this  maximal-leaf  binary  tree  and  thus  has  fewer 
leaves. 

(b)  The  height  h  can  be  arbitrarily  large. 

(c)  h  =  l-l. 

(d)  [log2(/)l  is  a  lower  bound  for  the  height  of  any  binary  tree  with  I  leaves.  It  is  easy  to  see  that 
you  can  construct  a  full  binary  tree  with  I  leaves  and  height  [log2(Z)]. 

(e)  [log2(Z)]  is  the  minimal  height  of  a  binary  tree. 

5.4.9  (a)   A  binary  tree  with  35  leaves  and  height  100  is  possible. 

(b)  A  full  binary  tree  with  21  leaves  can  have  height  at  most  20.  So  such  a  tree  of  height  21  is 
impossible. 

(c)  A  binary  tree  of  height  5  can  have  at  most  32  leaves.  So  one  with  33  leaves  is  impossible. 

(d)  A  full  binary  tree  with  65  leaves  has  minimal  height  [log2(65)]  =  7.  Thus  a  full  binary  tree 
with  65  leaves  and  height  6  is  impossible. 

5.4.11  (a)   Breadth-first:  MIAJKCEHLBFGD, 

Depth-first:  MICIEIHFHGHDHIMAMJMKLKBKM, 
Pre-order:  MICEHFGDAJKLB, 
Post-order:  CEFGDHIAJLBKM. 

(b)  The  tree  is  the  same  as  in  part  (a),  reflected  about  the  vertical  axis,  with  vertices  A  and  J 
removed. 

(c)  It  is  not  possible  to  reconstruct  a  rooted  plane  tree  given  just  its  pre-order  vertex  list.  A 
counterexample  can  be  found  using  just  three  vertices. 

(d)  It  is  possible  to  reconstruct  a  rooted  plane  tree  given  its  pre-order  and  post-order  vertex  list. 
If  the  root  is  X  and  the  first  child  of  the  root  is  Y ,  it  is  possible  to  reconstruct  the  pre-order 
and  post-order  vertex  lists  of  the  subtree  rooted  at  Y  from  the  pre-order  and  post-order  vertex 
lists  of  the  tree.  In  the  same  manner,  you  can  reconstruct  the  pre-order  and  post-order  vertex 
lists  of  the  subtrees  rooted  at  the  other  children  of  the  root  X.  Now  do  the  same  trick  on  these 
subtrees.  Try  this  approach  on  an  example. 
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Section  5.5 

5.5.1.  Let  D  be  the  domain  suggested  in  the  hint  and  define  f:D^  '^2{V)  by  f{{x,y))  =  {x,y}. 
Let  G{D)  =  where  V(e)  =  /(<p(e)). 

5.5.3.  Let  V  =  {u,v}  and  E  =  {{u,v),  {v,u)}. 

5.5.5.  You  can  use  the  notation  and  proof  of  Example  5.5  provided  you  change  all  references  to 
two  element  sets  to  references  to  ordered  pairs.  This  means  replacing  {a;,  y}  with  {x,  y),  {i^{x),  i'{y)} 
with  (zy(a;),i/(y))  and  'P2{Vi)  with  Vi  x  Vi. 

5.5.7.  "The  statements  are  all  equivalent"  means  that,  given  any  two  statements  v  and  w,  we 

have  a  proof  that  v  implies  w.  Suppose  D  is  strongly  connected.  Then  there  is  a  directed  path 
V  =  vi,V2,  ■  ■  ■  ,Vk  =  w.  That  means  we  have  proved  vi  implies  V2,  that  V2  implies  113  and  so  on. 
Hence  vi  implies  Vk- 

5.5.9.  Let  e  =  (uijUg)-  For  i  =  2,3, . . .,  as  long  as  u,  ^  ui  choose  an  edge  {ui,Ui+i)  that  has  not 
be  used  so  far.  It  is  not  hard  to  see  that  di^^Ui)  =  (iout(ui)  implies  this  can  be  done.  In  this  way  we 
obtain  a  directed  trail  starting  and  ending  at  Ui.  This  may  not  be  a  cycle,  but  a  cycle  containing  e 
can  be  extracted  from  it  by  deleting  some  edges. 

5.5.11  (a)    It's  easy  to  see  this  pictorially:  Suppose  there  were  an  isthmus  e  =  {u,v}.  Then  G 

consists  of  the  edge  e,  a  graph  Gi  containing  u,  and  another  graph  G2  containing  v.  Suppose  e 
is  directed  as  {u,  v) .  Clearly  one  can  get  from  Gi  to  G2  but  one  cannot  get  back  along  directed 
edges,  contradicting  strongly  connectedness. 

Here  is  a  more  formal  proof.  Suppose  there  were  such  a  path,  say  v  =  vi,V2,  ■  ■  ■  ,Vk  =  u 
It  does  not  contain  the  directed  edge  e  (since  e  goes  in  the  wrong  direction).  Now  look  at  the 
original  undirected  graph.  We  claim  removal  of  {u,  v}  does  not  disconnect  it.  The  only  problem 
would  be  a  path  that  used  {u,  v}  to  get  from,  say  x  to  y,  say  x, . . . ,  x',  u,  v,y' , . . . ,  y.  The  walk 
X, . . .  ,x'  ,Vk,  ■  ■  ■  ,V2,vi,y' , . . .  ,y  connects  u  and  v  without  using  the  edge  {u,  v}. 

(b)   See  Exercise  6.3.14  (p.  170). 

5.5.13  (a)   For  all  x  £  S,  x\x.  For  all  x,y  G  S,  if  x\y  and  x  ^  y,  then  y  does  not  divide  x.  For  all 
x,y,z  G  S,  x\y,  y\z  implies  that  x\z. 

(b)    The  covering  relation 

H  =  {(2, 4),  (2, 6),  (2, 10),  (2, 14),  (3, 6),  (3, 9),  (3, 15),  (4, 8),  (4, 12),  (5, 10),  (5, 15),  (6, 12),  (7, 14)}. 

5.5.15  (a)    There  arc  n"^^  trees.  Since  a  tree  with  n  vertices  has  n  —  1  edges,  the  answer  is  zero  if 
q  ^  n—l.  li  q  =  n—  1,  there  are  (^^-^-j^)  graphs.  Thus  the  answer  is  n"^'^  (i-l)     when  q  =  n—  1. 
(b)   We  have 

\n-\)   ^  (n-1)!  ~     2»-i  (n-1)!     ^  2"-Ve"-i  ^  vYJ 
Using  this  in  the  answer  to  (a)  gives  the  result  we  want.  It  turns  out  that 

)      ~  vW2n(2/e)", 

which  differs  from  our  estimate  by  a  constant  times  n^^^. 
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Section  5.6 

5.6.1.  Let  A  and  B  be  the  partition  of  the  vertices  guaranteed  by  the  definition  of  a  bipartite  graph. 
Let  k  =  \A\,  number  the  vertices  in  A  with  1  to  A:  and  those  is  B  with  fc  +  1  to  n.  Since  no  edges 
connect  vertices  in  A  to  eac;h  other,  A{G)  has  a,kxk  block  of  zeroes  in  its  upper  left  corner.  Similarly 
B  gives  a  block  in  the  lower  right  corner. 

5.6.3  (a)  a\''j  is  the  sum  over  alHi, . . . ,  tk-i  of  ai^tiO-ti,t2 ' ' '  o^tk-i.j-  Each  of  these  products  is  0  or 
1,  so  the  sum  is  nonzero  if  and  only  if  some  product  is  nonzero.  This  happens  if  and  only  if 
each  factor  in  the  product  is  nonzero.  This  happens  if  and  only  if  the  vertices  i,ti, . . . ,  tk-i,j 
form  a  walk. 

(b)  We  can  construct  a  path  from  a  walk  by  jumping  over  pieces  that  form  cycles.  Thus  the  shortest 
walk  from  i  to  j  is  a  path.  Here's  a  more  formal  argument.  Suppose  that  W  =  {i,t, . . .  ,v,  j)  is 
the  shortest  walk  from  i  to  j.  If  it  is  not  a  path,  then  there  must  be  repeated  vertices  in  the 
list.  Let  u  be  such  a  vertex.  Remove  all  vertices  from  the  sequence  after  the  first  occurrence  of 
u  up  to  and  including  the  last  occurrence  of  u.  The  result  is  a  shorter  walk,  contradicting  the 
minimality  of  W. 

(c)  The  obvious  idea  is  to  repeat  the  previous  statement  with  i  =  j:  "The  shortest  walk  from  i  to 
z  is  a  cycle."  This  is  not  true.  If  {i,j}  is  an  edge,  then  i  is  the  shortest  walk  from  i  to  j  but 
it  is  not  a  cycle.  The  result  would  be  true  if  we  were  looking  at  oriented  simple  graphs  because 
an  edge  can  be  traversed  in  only  one  direction.  All  we  can  claim  is  that  any  odd  length  walk 
from  i  to  i  contains  a  cycle. 

We  can  modify  the  situation  a  bit  by  looking  at  an  edge  of  the  graph.  Let  H  be 

the  graph  obtained  by  removing  it;  i.e.,  by  setting  aij  =  aj^i  =  0.  The  shortest  walk  from  j  to 
i  in  H  together  with  the  edge  {i,j}  is  a  cycle  of  G.  This  follows  from  the  previous  result  and 
the  definitions  of  path  and  cycle. 

(d)  Following  the  hint,       =  X^Lo  (t)-^*  ^y  the  binomial  theorem.  Since  (j)  >  0,  b^'^J  is  nonzero 

(t)  (k) 

if  and  only  if     j  ^  0  for  some  t  with  0<t<k.t  =  0  gives  the  identity  matrix,  so  6)  /  ^  0  for 

(k) 

all  k.  For  i  ^  j,  bl  J  7^  0  if  and  only  if  there  is  a  walk  from  i  to  j  for  some  t  <  k,  and  thus  if 

and  only  if  there  is  a  path  for  some  t  <  k.  Since  paths  of  length  t  contain  t+l  distinct  vertices, 

(k) 

no  path  is  longer  than  n  —  1.  Thus  there  is  a  path  from  i  to  j  ^  i  if  and  only  if  b^  J  ^0  for  all 
k>n-l. 

5.6.5.  We  claim  that  A{D)  is  nilpotent  if  and  only  if  there  is  no  vertex  i  such  that  there  is  a  walk 
from  i  to  i  (except  the  trivial  walk  consisting  of  just  i). 

First  suppose  that  there  is  a  nontrivial  walk  from  i  to  i  containing  k  edges.  Let  C  —  A{D)''.  It 

follows  that  all  entries  of  C  are  nonnegative  and  ^  0.  Thus  c^^^  ^  0  for  all  m  >  0.  Hence  A{D) 
is  not  nilpotent. 

Conversely,  suppose  that  A{D)  is  not  nilpotent.  Let  n  be  the  number  of  vertices  in  D  and 

suppose  that  i  and  j  are  such  that  af^^j  ^  0,  which  we  can  do  since  A{D)  is  not  nilpotent.  There 
must  be  a  walk  i  =  vo,vi,V2,  ■  ■  ■  ,Vn  =  j-  Since  this  sequence  contains  n  +  1  vertices,  there  must  be 
a  repeated  vertex.  Suppose  that  k  <  I  and  Vk  =  vi.  The  sequence  Vk,  Vk+i, . . .  ,vi  is  a,  nontrivial  walk 
from  Vk  to  itself. 
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Section  6.1 

6.1.1  (a)  One  description  of  a  tree  is:  a  connected  graph  such  that  removal  of  any  edge  disconnects 
the  tree.  Since  an  edge  connects  only  two  vertices,  we  will  obtain  only  two  components  by 
removing  it. 

(b)  Note  that  T  with  e  removed  and  /  added  is  a  spanning  tree.  Since  T  has  minimum  weight,  the 

result  follows. 

(c)  The  graph  must  have  a  cycle  containing  e.  Since  one  end  of  e  is  in  Ti  and  the  other  in  T2,  the 
cycle  must  contain  another  connector  besides  e. 

(d)  Since  T*  with  e  removed  and  /  added  is  a  spanning  tree,  the  algorithm  would  have  removed  / 
instead  of  e  if  A(/)  >  A(e). 

(e)  By  (b)  and  (d),  A(/)  =  A(e).  Since  adding  /  connects  Ti  and  T2,  the  result  is  a  spanning  tree. 

(f)  Suppose  T*  is  not  a  minimum  weight  spanning  tree.  Let  T  be  a  minimum  weight  spanning  tree 
so  that  the  event  in  (a)  occurs  as  late  as  possible.  It  was  proven  in  (e)  that  we  can  replace  T 

with  another  minimum  weight  spanning  tree  such  that  the  disagreement  between  T  and  T*,  if 
any,  occurs  later  in  the  algorithm.  This  contradicts  the  definition  of  T. 

6.1.3  (b)  Let  Q\  and  Q2  be  two  bicomponents  of  G,  let  vi  be  a  vertex  of  Qi,  and  let  be  a  vertex 
of  Q2-  Since  G  is  connected,  there  is  a  path  in  G  from  vi  to  V2-i  say  xi,...  ,Xp.  You  should 
convince  yourself  that  the  following  pseudocode  constructs  a  walk  wi,W2,  -  ■  ■  in  B{G)  from  Qi 
to  Q2- 

Set  'Wi  =  Qi,  j=2,  and  k  =  0. 

While  there  is  an      G  P{G)  with  i>  k. 

Let  i  >  k  he  the  least  i  for  which  Xi  £  P{G). 

If  i  =  p 

Set  Q  =  Q2- 

Else 

Let  Q  be  the  bicomponent  containing  {xi,Xi+i} . 
End  if 

Set  Wj  =  Xi ,  Wj+i  =  Q,  k  =  i,  and  j  =  j  +  2. 
End  while 

(c)  Suppose  there  is  a  cycle  in  B{G),  say  wi,  Qi, . . . ,  V]~,  Qk,  f  1,  where  the  Qi  arc  distinct  bicompo- 
nents and  the  Vi  are  distinct  vertices.  Set  Vk+i  =  V\.  By  the  definitions,  there  is  a  path  in  Qi 
from  Vi  to  Vi+\.  Replace  each  Qi  in  the  previous  cycle  with  these  paths  after  removing  the  end- 
points  Vi  and  Vi^i  from  the  paths.  The  result  is  a  cycle  in  G.  Since  this  is  a  cycle,  all  vertices 
on  it  lie  in  the  same  bicomponent,  which  is  a  contradiction  since  the  original  cycle  contained 
more  than  one  Qi. 

(d)  Let  V  be  an  articulation  point  of  the  simple  graph  G.  By  definition,  there  are  vertices  x  and 
y  such  that  every  path  from  x  to  y  contains  v.  Prom  this  one  can  prove  that  there  arc  edges 
e  =  {v,  x'}  and  /  =  {v,  y'}  such  that  every  path  from  x'  to  y'  contains  v.  It  follows  that  e  and 
/  are  in  different  bicomponents.  Thus  v  lies  in  more  than  one  bicomponent. 

Suppose  that  v  lies  in  two  bicomponents.  There  are  edges  e  =  {v,  w}  and  /  =  {v,  z}  such 
that  e  /  /.  It  follows  that  every  path  from  w  to  z  contains  v  and  so  t;  is  an  articulation  point. 
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6.1.5  (a)   Since  there  are  no  cycles,  each  component  must  be  a  tree.  If  a  component  has  n,  vertices, 

then  it  has  rii  —  \  edges  since  it  is  a  tree.  Since  ^  rii  over  all  components  is  n  and  —  1) 

over  all  components  is  fc,  n  —  fc  is  the  number  of  components. 

(b)  By  the  previous  part,  H^+i  has  one  less  component  than  Gk  does.  Thus  at  least  one  component 
C  of  Hk+i  has  vertices  from  two  or  more  components  of  Gk-  By  the  connectivity  of  C,  there 
must  be  an  edge  e  of  C  that  joins  vertices  from  different  components  of  G}..  If  this  edge  is 

added  to  Gk,  no  cycles  arise. 

(c)  By  the  definition  of  the  algorithm,  it  is  clear  that  X{gi)  <  A(ei).  Suppose  that  X{gi)  <  A(ei) 
for  1  <  i  <  k.  By  the  previous  part,  there  is  some  ej  with  1  <  j  <  fc  +  1  such  that  Gk  together 
with  Cj  has  no  cycles.  By  the  definition  of  the  algorithm,  it  follows  that  X{gk+i)  <  X{ej).  Since 
X{ej)  <  A(efc+i)  by  the  definition  of  the  e^'s,  we  are  done. 

6.1.7  (a)  Hint:  For  (1)  there  are  four  spanning  trees.  For  (2)  there  are  8  spanning  trees.  For  (3) 
there  are  16  spanning  trees. 

(b)  Hint:  For  (1)  there  is  one.  For  (2)  there  are  two.  For  (3)  there  are  two. 

(c)  Hint:  For  (1)  there  are  two.  For  (2)  there  are  four.  For  (3)  there  are  6. 

(d)  Hint:  For  (1)  there  are  two.  For  (2)  ther(^  ar(^  three.  For  (3)  there  are  6. 

6.1.9  (a)    Hint:  There  are  21  vertices,  so  the  minimal  spanning  tree  has  20  edges.  Its  weight  is  30. 

(b)  Hint:  Its  weight  is  30.. 

(c)  Hint:  Its  weight  is  30. 

(d)  Hint:  Note  that  X  is  a  the  only  vertex  in  common  to  the  two  bicomponents  of  this  graph. 
Whenever  this  happens  (two  bicomponents,  common  vertex),  the  depth-first  spanning  tree 
rooted  at  that  common  vertex  has  exactly  two  "principal  subtrees"  at  the  root.  In  other  words, 
the  root  of  the  depth-first  spanning  tree  has  degree  two.  Finding  depth  first  spanning  trees  of 
minimal  weight  is,  in  general,  difficult.  You  might  try  it  on  this  example. 

Section  6.2 

6.2.1.  This  is  just  a  matter  of  a  little  algebra. 

6.2.3  (a)    To  color  G,  first  color  the  vertices  of  H  AND  then  color  the  vertices  of  K.  By  the  Rule 

of  Product,  Pg{x)  =  Ph{x)Pk{x). 

(b)  Let  V  be  the  common  vertex.  There  is  an  obvious  bijection  between  pairs  of  colorings  (A^,  A;^ ) 
of  H  and  K  with  Xh{v)  =  Xk{v)  and  colorings  of  G.  We  claim  the  number  of  such  pairs  is 
Ph{x){Pk{x)/x).  To  see  this,  note  that,  in  the  colorings  of  K  counted  by  Pk{x),  each  of  the 
X  ways  to  color  v  occurs  equally  often  and  so  1/x  of  the  colorings  will  have  Xk{v)  equal  to  the 
color  given  by  Xh{v). 

(c)  The  answer  is  Ph{x)Pk{x){x  —  l)/x.  We  can  prove  this  directly,  but  we  can  also  use  (b)  and 
(6.4)  as  follows.  Let  e  =  {v,w}.  By  the  construction  of  G,  PG-e{x)  =  Ph{x)Pk{x).  By  (b), 
PgA^)  =  Ph{x)Pk{x)/x.  Now  apply  (6.4). 

6.2.5.  Let  the  solution  be  P„(x).  Clearly  Pi  (a;)  =  x{x  —  1),  so  we  may  suppose  that  n  >  2.  Apply 

deletion  and  contraction  to  the  edge  {(1, 1),  (1,2)}.  Deletion  gives  a  ladder  with  two  ends  sticking  out 
and  so  its  chromatic  polynomial  is  {x  —  l)^P„_i(x).  Contraction  gives  a  ladder  with  the  contracted 
vertex  joined  to  two  adjacent  vertices.  Once  the  ladder  is  colored,  there  are  x  —  2  ways  to  color  the 
contracted  vertex.  Thus  we  have 

Pn{x)  =  (.T-l)2p„_i(x)-(a;-2)P„_i(a;)  =  (a;^  -  3a;  +  3)P„_i(a;). 

The  value  for  P„(a:)  now  follows  easily. 
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6.2.7.  The  answer  is 

-  I2x'  +  66x^  -  214x^  +  441x*  -  572x^  +  423a;2  -  133x. 

There  seems  to  be  no  really  easy  way  to  derive  this.  Here's  one  approach  which  makes  use  of 
Exercise  6.2.3  and  Pz^{x)  for  n  =  3,4,5.  Label  the  vertices  reading  around  one  face  with  a,b,c,d 
and  around  the  opposite  face  with  A,B,C,D  so  that  {a,  A}  is  an  edge,  etc.  If  the  edge  {a,  A}  is 
contracted,  call  the  new  vertex  a.  Introduce  f3,  7  and  S  similarly. 

Let  ei  =  {a,  A}  and  62  =  {6,  B}.  Note  that  G  —  ei  —  62  consists  of  three  squares  joined  by 
common  edges  and  that  H  =  0^  —  is  equivalent  to  (G  —  ei)e2-  We  do  H  in  the  next  paragraph. 
In  K  =  Geie2>  /  =  {a,f}}.  K  —  f  is,  two  triangles  and  a  square  joined  by  common  edges  and  Kf 
is  a  square  an  a  vertex  v  joined  to  the  vertices  of  the  square.  By  first  coloring  v  and  then  the  square, 
we  see  that  Pkj{x)  =  xPz^{x  —  1). 

Let  /i  =  {c,  G},  /2  =  {d,  D}  and  h  =  {P,  l}-  Then 

•  H  —  fi  —  f2  is  two  Z5S  sharing  /3; 

•  (i?  —         is  easy  to  do  if  you  consider  two  cases  depending  on  whether  (3  and  S  have  the  same 
or  different  colors,  giving  x{x  —  l){x  —  2)"'  +  x{x  —  1)"*; 

•  iJ/i  —  /a  is  a      and  a  triangle  with  a  common  edge  and 

•  Hf^f^  are  three  triangles  joined  by  common  edges. 

6.2.9.  This  can  be  done  by  induction  on  the  number  of  edges.  The  starting  situation  involves  some 
number  n  of  vertices  with  no  edges.  Since  the  chromatic  polynomial  is  a;",  the  result  is  proved  for 
the  starting  condition. 

Now  for  the  induction.  Deletion  does  not  change  the  number  of  vertices,  but  reduces  the  number 
of  edges.  By  induction,  it  gives  a  polynomial  for  which  the  coefficient  of  x''  is  a  nonncgativc  multiple 
of  (—1)""*'.  Contraction  decreases  both  the  number  of  vertices  and  the  number  of  edges  by  1  and  so 
gives  a  polynomial  for  which  the  coefficient  of  x'^  is  a  nonnegative  multiple  of  (— l)""^"*^.  Subtracting 
the  two  polynomials  gives  one  where  the  coefficient  of  x'^  is  an  nonnegative  multiple  of  (—1)""'^. 


Section  6.3 

6.3.1.  Every  face  must  contain  at  least  four  edges  and  each  side  of  an  edge  contributes  to  a  face. 
Thus  4/  >  (edge  sides)  =  2e.  From  Euler's  relation, 

2  =  v-e  +  f  >  v-e  +  e/2  =  {2v-e)/2 

and  so  e  >  2v  —  4. 

6.3.3  (a)    We  have  2e  =  fdf  and  2e  =  vdy.  Use  this  to  eliminate  v  and  /  in  Euler's  relation. 

(b)  They  are  cycles. 

(c)  If  >  4  and  dy  >  4,  we  would  have  0<2/d/  +  2/ci^  —  1<0,  a  contradiction.  Thus  at  least 
one  of  dy  and  df  is  3.  Since  dy  >  3,  we  have  2/dy  <  2/3.  Thus 

0        ^  ^_i<^_^ 
df     dy        ~  df  3 

and  so  df  <  2/(1/3)  =  6.  Since  df  is  an  integer,  df  <  5.  Since  df  >  3  for  a  simple  graph, 

interchanging  /  and  v  in  the  above  gives  ns  dy  <  5. 

(d)  Altogether  there  are  5  possibilities  for  the  pair  {dy,df)  by  the  previous  part  of  the  exercise. 
Given  df  and  dy,  we  can  solve  (6.9)  for  e.  Then  vdy  =  2e  and  fdf  =  2e  give  v  and  /.  The  five 
graphs  turn  out  to  be  the  Platonic  solids  with  the  interiors  removed.  (They  are  the  tetraliedron, 
cube,  octahedron,  dodecahedron  and  icosahedron.) 
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6.3.5.  The  value  of  c  is  zero.  Suppose  when  we  cut  as  directed  we  cut  through  k  edges.  Each  of 

these  edges  now  becomes  two,  giving  us  k  new  edges.  The  same  happens  with  the  k  faces.  On  each 
of  the  circles  that  we  fill  in  with,  we  also  get  k  edges  and  k  vertices.  The  two  circles  give  us  2  new 
faces.  In  summary,  if  we  originally  had  \V\  vertices,  \E\  edges  and  /  faces  on  the  torus,  we  now  have 
a  graph  embedded  on  the  sphere  with  V\  +  2k  vertices,  \E\  +  k  +  2k  edges,  and  /  +  fc  +  2  faces.  From 
Euler's  relation  on  the  sphere, 

2  =  {\V\+2k)-{\E\  +  3k)  +  {f  +  k  +  2)  =  \V\-\E\  +  f. 

Thus  \V\  -\E\  +  f  =  0. 

There's  a  subtle  issue  here:  We  described  the  cut  as  if  each  edge  and  face  it  encountered  was 
different.  This  may  not  be  the  case,  an  edge  (and  face)  can  twist  around  the  torus  so  that  the  cut 
meets  it  more  than  once;  however,  the  counts  are  still  correct.  One  way  to  see  this  is  to  imagine 
what  happens  if  we  cut  around  the  face  and  stretch  it  flat.  Stretching  will  distort  our  "bracelet  cut" 
into  some  sort  of  curve  that  may  cut  through  the  face  several  times.  Every  time  it  passes  through 
the  face  it  creates  another  face,  two  edges  and  two  vertices. 

6.3.7.  One  method  is  to  list  all  the  simple  planar  graphs  with  V  =  5  and  find  the  least  colorings 

for  them.  We  use  a  theoretical  argument  instead. 

The  lex  least  proper  coloring  of  k  C  V  uses  at  most  the  first  k  colors.  If  it  uses  all  k  colors, 
then  vertex  k  must  be  connected  to  each  of  the  other  vertices  and  the  first  k  —  1  vertices  must  use 
all  of  the  first  k  ~  1  colors. 

Let's  apply  these  observations  with  /c  =  5,  4,  3  and  2  to  a  graph  whose  lex  least  coloring  takes 
5  colors.  With  A;  =  5,  we  see  that  vertex  5  is  connected  to  each  of  the  first  4  vertices  and  they  use 
4  different  colors.  Now,  with  fc  =  4,  we  see  that  vertex  4  is  connected  to  each  of  the  first  3  vertices 
and  they  use  3  different  colors.  Doing  the  same  thing  with  fc  =  3  and  fc  =  2,  we  finally  see  that  every 
vertex  is  connected  to  every  other;  i.e.,  the  graph  is  K^,  which  is  not  planar. 

6.3.9.  The  argument  for  degree  4  is  correct.  For  degree  5,  we  can  assume,  perhaps  after  rotating 

and  or  flipping  the  graph,  that  yi,...,y5  are  assigned  colors  ci,  C2,  C3,  C4  and  C2,  respectively. 
Suppose  we  look  at  j/i  and  j/3  as  in  the  text.  The  argument  given  there  is  okay  if  we  get  yi  and 
in  separate  components.  If  they  are  in  the  same  component,  we  end  up  switching  colors  C2  and  C4 
in  the  component  of  the  subgraph  colored  by  C2  and  C4  that  contains  W4.  The  colors  of  yi, ...  ,1/4 
are  now  ci,  C2,  C3  and  C2.  If  2/5  was  not  in  the  same  component  with  2/4,  it  is  colored  C2  and  we  are 
done.  Unfortunately,  if  2/4  and  2/5  are  in  the  same  component,  its  color  is  switched  to  C4.  You  should 
convince  yourself  that  there  is  no  way  to  arrange  things  to  avoid  this  possibility. 

6.3.11  (a)    We  start  with  A  =  A  cycle  is  (1,2,3,4,7),  so  we  now  have  A  = 

(1234^^7)-  Another  cycle  is  (1,2,5,6,7),  so  we  look  at  the  path  2,5,6,7  and  choose 
A=  (1234067)  2  <  a  <  6  <  7  and  A  is  an  injection.  Depending  on  the  choice  of 

a  and  b  compared  to  3  and  4,  we  have  get  five  different  1, 7-labelings.  There  could  be  others. 

(b)   Any  1, 7-labeling  can  be  converted  to  a  7, 1-labeling  simply  by  defining  A7,i(a;)  =  8  —  Ai,7(a;). 

6.3.13.  We'll  find  all  s,  t-labelings.  Suppose  ^^3,3  consists  of  all  possible  edges  between  1,2,3  and 
a,b,c.  By  symmetry,  we  may  assume  that  A(l)  =  1  and  either  A(2)  =  6  or  A(a)  =  6.  In  the  former 
case,  condition  (c)  requires  that  A(3)  be  less  than  5  and  more  than  2.  Up  to  symmetry,  this  gives  us 
two  answers: 

\    _    (  1  2  3  a  b  c\  \    _    (  1  2  3  a  h  c\ 

\163245/  V16423  5  7' 

Now  suppose  A(a)  =  6.  In  this  case,  one  of  A(2)  and  A(3)  must  be  greater  than  A(6)  and  A(c).  Thus, 
up  to  symmetry,  we  have  A(2)  =  5.  Similarly  \{b)  =  2.  This  leads  to  two  more  answers: 


_    (I23abc\  \    _  (123abc\ 

V15362  5/  V154625/ 
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6.3.15.  We  know  from  the  text  that  a  biconnected  graph  has  an  st-labeUng.  If  \V\  =  2,  the  result 

is  trivial.  Suppose  that  wc  have  an  ,sf- labeling  and  that  {x,y}  is  an  edge  different  from  {s,t}.  We 
may  assume  that  A(a;)  <  X{y).  By  (ill)  in  the  definition  of  st-labehng,  we  can  find  a  sequence 
y  =  Wi,  W2,  ■  ■  ■  =  t  such  that  X{wi)  is  strictly  increasing  and  such  that  {wi,  Wj+i}  €  E.  Similarly,  we 
can  find  x  =  ui,U2,  ■  ■  ■  =  s.  These  two  paths  with  {s,t}  and  {x,y}  form  a  cycle  of  G  and  so  {x,y} 
and  {s,  t}  are  in  the  same  bicomponent. 

Section  6.4 

6.4.1  (a)  The  value  of  a  maximum  flow  is  45.  Every  maximum  flow  /  will  have  f(q,  f)  —  10.  Some 
other  values  of  /  are  also  determined  uniquely,  but  many  are  not;  for  example,  the  flow  into 
r  can  have  any  value  from  15  to  20.  Of  course,  the  flows  on  the  minimum  cut  set  are  unique. 
There  are  four  minimum  cut  sets.  The  one  found  using  A{f)  is 

{{r,  h},  {/,  a},  {k,  e},  {y,  u},  {z,  u}}. 

The  others  are  obtained 

(i)  by  deleting  {r,  h}  and  adding  {h,  a}  and  {h,  c}, 

(ii)  by  deleting  {y,  u}  and  {z,  u}  and  adding  {u,  n},  or 

(iii)  by  doing  both  (i)  and  (ii). 

(b)  See  the  previous  solution. 

(c)  The  value  of  a  maximum  flow  is  25.  Every  maximum  flow  /  will  have  f{v,  q)  =  10.  Some  other 
values  of  /  are  also  determined  uniquely,  but  many  are  not.  There  is  just  one  minimum  cut 
set: 

{{c,  d},  {k,  e},  {r,  x},  {w,  x}}. 

(d)  See  the  previous  solution.  Since  we  do  not  have  tools  for  finding  all  minimum  cut  sets,  you 
may  not  have  been  able  to  prove  that  the  minimum  cut  set  was  unique. 

6.4.3.  Since  no  complete  augmentable  path  exists,  2?in  C  AC  V  —  Vout-  Since  b{v)  =  0  for  v  <^  V,  it 
follows  that  J2veA  K"^)  —  'l2veT>-  ^(^)'  which  is  the  definition  of  the  value  of  a  fiow.  Recall  that  b{v) 
is  the  sum  of  all  fiows  out  of  v  minus  the  sum  of  all  fiows  into  v.  It  follows  that  for  e  =  {x,  y)  G  E, 
b{x)  has  a  contribution  of  f{x,y)  and  b{y)  has  a  contribution  of  —/(e).  We  distinguish  four  cases 
according  as  x  and  y  are  in  ^  or  B  and  ask  what  /(e)  contributes  to  X^^g^^l^)- 

(i)  x  £  B,  y  £  B:  Then  /(e)  contributes  nothing  to  the  sum. 

(ii)  x  G  A,  y  G  A:  Then  /(e)  contributes  both  /(e)  and  —/(e),  which  gives  a  net  contribution 
of  zero. 

(iii)  x  e  A,  y  e  B;  i.e.,  (e)  G  FROM(A,S):  Then  /(e)  contributes  /(e)  to  the  sum. 

(iv)  X  e  B,  y  G  A;  i.e.,  (e)  G  FROM(B,  A):  Then  /(e)  contributes  -/(e)  to  the  sum. 

6.4.5  (a)  Without  examining  the  network  in  detail,  we  would  need  to  let  c[  and  c'2  (resp.  C3  and  C4) 
be  the  sum  of  the  capacities  of  edges  leaving  (resp.  entering)  the  corresponding  P/.  That  way 
we  can  guarantee  the  capability  of  supplying  (resp.  removing)  as  much  fiuid  as  the  pump  could 
possibly  send  out  to  (resp.  get  in  from)  other  other  sources.  If  we  know  all  the  maximum  fiows 
for  the  original  network,  we  may  be  able  to  improve  on  this:  We  need  to  set  to  the  largest 
net  fiow  out  of  (resp.  into)  Di  for  all  maximum  fiows  in  the  original  network.  This  leads  to  no 
improvement  in  this  case. 
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(b)  Yes.  Let  /'  be  a  flow  in  the  new  network  shown  for  the  exercise.  With  the  c[  edges  removed 
and  the  P'^  pumps  converted  back  to  depots.  If  we  eliminate  these  edges  from  /'  we  obtain  a 
flow  /  in  the  network  of  Figure  6.6.  We'll  have  value(/)  =  value(/')  because  the  sum  of  the  net 
flows  out  of  Di  and  D2  for  /  equals  the  net  flow  out  of  Dq  for  /'  because  b{P[)  =  6(^2)  =  0 
for  /'. 

6.4.7.  Let  /  and  g  be  two  maximum  flows  and  let  A  =  A{f).  By  the  proof  of  the  Augmentable 
Path  Theorem,  we  see  that  value(5)  =  value(/)  if  and  only  if  g{e)  =  c{e)  for  all  e  G  FROM(^,  B) 
and  g{e)  =  0  for  all  e  e  FROM{B,A).  It  is  tempting  to  conclude  that  therefore  A  =  A{g),  but  this 
does  not  follow  immediately. 

Here  is  a  correct  proof.  As  above,  let  A  =  A{f).  \i  A^  A{g),  we  can  assume  that  there  is  some 
V  e  -4(5')  with  V  ^  A.  (If  not,  interchange  the  names  of  /  and  g.)  Let  Ui,  W2, . . .  be  an  augmentable 
path  for  g  that  ends  at  v.  Let  5  be  its  increment.  Since  v  ^  A,  and  ui  G  Pin  Q  A,  there  is  an  i 
with  Ui  ^  A  and  Ui+i  G  B.  If  e  =  (ui,Ui+i)  is  the  directed  edge  of  G,  then  g(e)  <  c(e)  —  5  and 
e  e  FROM(^,B).  If  e  =  (ui+i,Ui)  is  the  directed  edge  of  G,  then  g{e)  >  (5  and  e  €  ¥ROM{B,A). 
In  either  case,  the  idea  in  the  previous  paragraph  proves  that  value(g')  <  value  (/),  contradicting  the 
assumption  that  g  is  a  maximum  flow. 

6.4.9  (a)    This  is  trivial. 

(b)  Consider  the  sets  when  a  is  removed  from  them  and  the  set  An  is  removed.  We  have  reduced 
n  by  1  and  (6.12)  still  holds  (but  may  the  inequalities  may  not  be  strict).  By  induction,  we  are 
done. 

(c)  By  induction,  there  is  an  SDR  for  the  Ai,  i  €  /.  If  the  claimed  inequality  is  true,  then  there 
is  also  an  SDR  for  the  Bi,  i  £  n  —  X.  Taken  together,  these  give  us  our  representatives.  It 
remains  to  prove  the  inequality.  We  have 


\J  A  =  (\Jb^  ux, 


where  the  last  union  is  disjoint.  Thus 


Ub.  =  u  ^ 


\X\  >  |i?U/|  -  |7|  =  \R\ 


6.4.11.  The  result  in  the  previous  exercise  is  valid  when  all  edges  are  taken  to  be  undirected.  To 
see  this,  construct  a  directed  graph  by  replacing  each  edge  {x,  y}  of  G  with  the  two  edges  (x,  y)  and 
(y,  x).  The  first  part  of  the  previous  proof  goes  through.  If  a  directed  path  ei,  62, . . .  is  constructed 
from  a  flow,  replace  each  edge  {x,y)  in  the  directed  path  with  {x,y}.  This  gives  what  we  will  call 
a  pseudo-path.  The  same  edge  may  appear  twice  in  the  pseudo-path  because  there  may  be  two 
directed  edges  =  {x,  y)  and  ej  =  {y,  x)  which  give  the  same  undirected  edge.  We  may  assume  that 
i  <  j.  Replace  the  pseudo-path  with  the  pseudo-path  obtained  from  ei, . . . ,  ei_i,  e^+i, . . ..  Iterating 
this  process  eventually  leads  to  a  path  from  u  to  v.  (You  may  want  to  flll  in  some  details  about 
that.) 
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Section  6.5 

6.5.1  (a)   The  probability  that  a  vertex  v  has  degree  d  is  — p)""^""^  since  we  must  choose 

d  of  the  remaining  n  —  1  vertices  to  connect  to  v,  then  multiply  by  the  probability  of  an  edge 
being  present  (p)  or  absent  (l—p).  ProbabiUties  multiply  since  edges  are  independent  in  Gp{n). 
Using  linearity  of  expectation  and  summing  over  all  n  vertices,  we  get  n("^^)p''(l  —  p)""^""^. 

(b)  If  C  is  a  potential  4-cycle  of  4  vertices,  let  Xc  =  1  if  the  cycle  is  present  and  Xc  =  0  if  it  is  not. 

Then  Fj{Xc)  =  p^-  Wc  must  multiply  this  by  the  number  of  choices  for  C;  that  is,  the  number 
of  potential  4-cycles.  This  number  is  (2)  x  3  =  "("-i)(^"^-2)(k-3)  ^  ^YAch.  can  be  derived  in  at 
least  two  ways: 

•  Note  that  there  are  3  ways  to  make  a  4-cycle  out  of  a  set  of  4  vertices. 

•  Choose  an  ordered  list  of  4  vertices  that  represent  walking  around  a  cycle.  There  are  4 
vertices  that  could  have  been  chosen  as  the  starting  vertex  and  2  ways  we  could  have 
gone  around  the  cycle. 

(c)  This  is  the  same  as  the  previous  situation,  except  that  now  we  must  make  sure  the  two  edges 
that  cut  across  the  4-cycle  are  not  present.  Hence  the  answer  is  3(")p*(l  —  p)^. 

6.5.3  (a)  The  probability  of  a  cycle  is  the  probability  of  the  union  of  the  sets  Qc-  The  probability 
of  the  union  of  sets,  is  less  than  or  equal  to  the  sum  of  their  separate  probabilities;  that  is, 
Pr(AuBU---)  <  Pr(A) Pr(B)  H  . 

(b)  The  denominator  is  |0(n,       The  numerator  counts  graphs  as  follows.  There  are  (c  —  1)! 

directed  cycles.  Since  each  cycle  can  be  made  directed  in  two  ways,  there  are  (c—  l)!/2  cycles. 
Since  we  have  used  up  c  edges  making  the  cycle,  we  must  choose  k  —  c  edges  from  the  remaining 
N  —  c  unused  edges. 

(c)  Collect  terms  in  (a)  according  to  c  =  \C\  and  use  (b).  There  are  (")  c-subsets  of  n. 

(d)  The  left  side  comes  from  writing  ( ^ )  =  and  doing  some  algebra.  The  inequality 
comes  from  j^f^  <  k"  and  |5|  <  |  when  y>x>j. 

6.5.5  (a)  Let  T  contain  a  close  to  half  the  vertices  as  possible.  If  \V\  =  2n,  \T\  =  n  and  |F  — T|  =  n. 
Since  G  contains  all  edges,  this  choice  of  T  gives  us  a  bipartite  subgraph  with  edges.  When 
\V\  =  2n+l,  wetake  \T\  =  n  and  — r|  =  n  +  1,  obtaining  a  bipartite  subgraph  withn(n+l) 
edges. 

(b)  The  example  bound  is  \E\/2  and  =  \V2iV)\  =  \V\{\V\  -  l)/2.  For  \V\  =  2n,  we  have 
\E\/2  =  n{2n  —  l)/2  =  —  n/2.  Hence  the  bound  is  off  by  n/2.  This  may  sound  large,  but 
the  relative  error  is  small:  Since  (n^  —  n/2)/n^  =  1  —  l/2n,  the  relative  error  is  We  omit 
similar  calculations  for  |y|  =  2n  -|-  1. 

(c)  The  idea  is  to  construct  the  largest  possible  complete  graph  and  then  add  edges  in  any  manner 
whatsoever.  Let  m  be  the  largest  integer  such  that  k  >  (™),  choose  S  C  V  with  IS"!  =  m, 
construct  a  complete  graph  on  m  vertices  using  (™)  edges,  and  insert  the  remaining  k  —  (™) 
edges  in  any  manner  to  form  a  simple  graph  G(V,  E).  By  (a),  the  number  of  edges  in  a  bipartite 
subgraph  of  the  complete  graph  on  T  has  at  least  {m/2Y edges  for  some  constant  C  Since  m 

is  as  large  as  possible,  k  <  ("+^)  <  i^'^i^^.  Thus  m  +  1  >  V2fc.  Also,  since  k  >  ('2)  > 
m  —  l<  \f2k.  Hence  the  number  of  edges  in  bipartite  subgraph  is  at  least 

(m/2)2-m  >  '—-\Plk-\, 

Which  equals  k  minus  terms  involving  A;^/^  and  constants. 
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Figure  S.6.1  The  transition  table  for  a  finite  automaton  that  recognizes  floating  point  numbers,  the 
possible  inputs  are  sign  (cr),  decimal  point  (•),  digit  (S)  and  exponent  symbol  (E).  The  comments  explain 
the  states. 


(d)   Call  the  colors  1,2,3.  Let  Vi  be  the  set  of  vertices  colored  with  color  i  and  let  Eij  be  the  set 

of  edges  in  G  that  connect  vertices  in  Vi  to  vertices  in  Vj.  Since  \E\  =  |ii^o.i|  +  |^'0.2|  +  |-E-i,2|; 
at  least  one  of  \Eij\  is  at  most  |£'|/3.  Suppose  it  is  i?i,2-  The  bipartite  subgraph  whose  edges 
connect  vertices  in  Vq  to  vertices  in  Vi  U  V2  contains  E  —  \Ei^2\  >  2|i^|/3  edges. 

Section  6.6 

6.6.1.  The  left  column  gives  the  input  and  the  top  row  the  states. 
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6.6.3.  The  states  are  0,  01,  El  and  R.  In  state  0,  a  zero  has  just  been  seen;  in  01,  an  odd  number 

of  ones;  in  El,  an  even  number.  The  start  state  is  0  and  the  accepting  states  are  0  and  01.  The  state 
R  is  entered  when  we  are  in  El  and  see  a  0.  Thereafter,  R  always  steps  to  R  regardless  of  input. 
You  should  be  able  to  finish  the  machine. 

6.6.5.  In  our  input,  we  let  5  stand  for  any  digit,  since  the  transition  is  independent  of  which  digit 

it  is.  Similarly,  a  stands  for  any  sign.  There  is  a  bit  of  ambiguity  as  to  whether  the  integer  after 
the  E  must  have  a  sign.  We  assume  not.  The  automaton  contains  three  states  that  can  transit  to 
themselves:  recognizing  digits  before  a  decimal,  recognizing  digits  after  a  decimal  and  recognizing 
digits  after  the  E.  We  call  them  la,  lb  and  2.  There  is  a  bit  of  complication  because  of  the  need  to 
assure  digits  in  the  first  part  and,  if  it  is  present,  in  the  second  part.  The  transition  table  is  given  in 
Figure  S.6.1. 

6.6.7  (a)   We  need  states  that  keep  track  of  how  much  money  is  held  by  the  machine.  This  leads 

us  to  states  named  0,5,...,  30.  The  output  of  the  machine  will  be  indicated  by  An,  Bn,  Cn 
and  n,  where  n  indicates  the  amount  of  money  returned  and  A,  B  and  C  indicate  the  item 
delivered.  There  may  be  no  output.  The  start  state  is  0. 


(b)   See  Figure  S.6.2. 
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Figure  S.6.2  The  transitions  and  outputs  for  an  automaton  that  behaves  like  a  vending  machine.  The 
state  is  the  amount  of  money  held  and  the  input  is  either  money,  a  purchase  choice  (A,  B,  C)  or  a  refund 
request  (R). 


Section  7.1 

7.1.1.  A{'m)  (note  m,  not  n)  is  the  statement  of  the  rank  formula.  The  inductive  step  and  use  of 
the  inductive  hypothesis  are  clearly  indicated  in  the  proof. 

7.1.3.  Let  A{k)  be  the  assertion  that  the  coefhcient  of  j/™^  •  •  •  y™''  in  (yi  +  ■  ■  is  n!/mi!  •  •  ■  nikl 

if  n  =  mi  H  h  nik  and  0  otherwise.  A{1)  is  trivial.  We  follow  the  hint  for  the  induction  step.  Let 

x  =  2/1  H  +  j/fc-i-  By  the  binomial  theorem,  the  coefficient  of  in  {x  +  y^)"  is  nl/m\mi.\  if 

n  =  m  +  ruk  and  0  otherwise.  By  the  induction  hypothesis,  the  coefficient  of  y™^  •  •  •  y^f  ^  in  a;"*  is 
m!/mi!  •  •  •  mfc_i!  if  m  =  mi  +  •  •  •  +  mk-i  and  zero  otherwise.  Combining  these  results  we  see  that 
the  coefficient  of  t/™^ " " '  2/™*°  in  (yi  H  +  y^)"  is 

n!  m! 
to!  TOfe!  TOi!  •  •  •  TOfe_i! 

if  n  =  TOi  +  •  •  •  +  nrik  and  0  otherwise. 
7.1.5(a)   x'ix'2  +  x'iX2  =  x'l- 

(b)  X[X2  +  X1X2. 

(c)  x[x'2Xs  +  X[X2X3  +  X\x'2x''j^  +  X\X2x'^    =   x'lXs  +  Xix';^. 

(d)  x[x'2X:}  +  X[x2x'r^  +  X[X2X3  +  XiX2X'^    =    X[X2  +  X[X3  +  XiX2x'^. 

7.1.7.  If  you  are  familiar  with  de  Morgan's  laws  for  complementation,  you  can  ignore  the  hint  and 
give  a  simple  proof  as  follows.  By  Example  7.3,  one  can  express  /'  in  disjunctive  form:  /'  =  Mi  + 
M2  +  -  ■  ■■  Now  /  (/')'  =  M{M^  •  •  •  by  de  Morgan's  law  and,  if  Mi  =  yiy2  ■  ■  ■,  then  M(  =  y[+y2  +  -  ■  ■ 
by  de  Morgan's  law. 

To  follow  the  hint,  replace  (7.5)  with 

f{Xi,...,Xn)    =    {gi{Xi,.  .  .  ,Xn-l)  +  x'J  {go{xi,.  .  .  ,Xn-l)  +  Xn) 

and  practically  copy  the  proof  in  Example  7.3. 

7.1.9.  We  can  induct  on  either  k  or  n.  It  doesn't  matter  which  we  choose  since  the  formula  we  have 
to  prove  is  symmetric  in  n  and  k.  We'll  induct  on  n.  The  given  formula  is  A{n).  For  n  =  0,  the 
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formula  becomes  Fk+i  =  Fk+i,  which  is  true. 


Fn+k+l    —   F(n-l)  +  {k+l)  +  l 


using  the  hint 
by  A{n  -  1) 
by  definition  of  Fk+2 
by  rearranging 
by  definition  of  -F^+i. 


=  FnFk  +  2  +  Fn-lFk+1 

=  Fn{Fk+l  +  Fk)  +  Fn-iFk+1 

=  {Fn  +  Fn-i)Fk+i  +  FnFk 

=  Fn+lPk+1  +  FfiPk 


Section  7.2 

7.2.1.  Given  that  p  and  q  are  positive  integers,  it  does  not  follow  that  p'  and  q'  are  positive  integers. 
(For  example,  let  p  =  1.)  Thus  A{n  —  1)  may  not  apply. 

7.2.3.  You  may  object  that  the  induction  has  not  been  clearly  phrased,  but  this  can  be  overcome: 

Let  /  be  the  set  of  interesting  positive  integers  and  let  A{n)  be  the  assertion  n  €  /.  If  .4(1)  is  false, 
then  even  1  is  not  interesting,  which  is  interesting.  The  inductive  step  is  as  given  in  the  problem:  If 
A{n)  is  false,  then  since  A{k)  is  true  for  all  k  <  n,  n  is  the  smallest  uninteresting  number,  which  is 
interesting. 

Then  what  is  wrong?  It  is  unclear  what  "interesting"  means,  so  the  set  of  interesting  positive 
integers  is  not  a  well  defined  concept.  Proofs  based  on  foggy  concepts  are  always  suspect. 

7.2.5.  To  show  the  equivalence,  we  must  show  that  an  object  is  included  in  one  definition  if  and 
only  if  it  is  included  in  the  other.  We  do  this  by  induction  on  the  number  of  vertices.  Before  doing 
this,  however,  we  observe  that  the  objects  constructed  in  Example  7.9  are  trees: 

•  They  are  connected  since  Ti, . . . ,     are  connected  by  induction. 

•  They  have  no  cycles  since  Ti,...,T/;  have  no  cycles  by  induction  and  have  no  vertices  in 
common  by  assumption. 

We  now  turn  to  the  inductive  proof  of  equivalence. 

(i)  You  should  be  able  to  see  that  both  definitions  include  the  single  vertex. 

(ii)  Now  for  the  inductive  step  in  one  direction:  Suppose  T  has  n  >  1  vertices  and  is  included  in 
the  definition  in  Example  7.9.  By  the  induction  hypothesis,  Ti, . . .  ,Tfc  are  included  in  Defi- 
nition 5.12  (p.  139).  By  the  construction  in  Example  7.9,  the  roots  of  Ti, . . .  ,Tk  are  ordered 
and  are  the  children  of  the  root  of  the  new  tree.  Furthermore,  joining  Ti,. . .  ,Tk  to  a  new 
root  preserves  the  orderings  and  parent-child  relationships  in  the  Tj.  Hence  this  tree  satisfies 
Definition  5.12. 

(iii)  Now  for  the  other  direction.  Let  ri, . . . ,  rfe  be  the  ordered  children  of  the  root  r  of  the  tree 
T  in  Definition  5.12.  Following  on  down  through  the  children  of  rj,  we  obtain  an  RP-tree  Tj 
which  is  included  in  Example  7.9  since  it  has  fewer  than  n  vertices.  By  the  argument  in  (ii), 
the  construction  forms  an  RP-tree  from  Ti , . . . ,     which  can  be  seen  to  be  the  same  as  T. 
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Section  7.3 

7.3.1  (a)  We  must  compare  as  long  as  both  lists  have  items  left  in  them.  After  all  items  have  been 
removed  from  one  list,  what  remains  can  simply  be  appended  to  what  has  been  sorted.  All 
items  will  be  removed  from  one  list  the  quickest  if  each  comparison  results  in  removing  an  item 
from  the  shorter  list.  Thus  we  need  at  least  min(fci,  fe)  comparisons. 

On  the  other  hand,  suppose  we  have  ki  +  k2  items  and  the  smallest  ones  arc  in  the  shorter 
list.  In  this  case,  all  the  items  axe  removed  from  the  shorter  list  and  none  from  the  longer  in 
the  first  min(fci,  ^2)  comparisons,  so  we  have  achieved  the  minimum. 

(b)  Here's  the  code.  Note  that  the  two  lists  have  lengths  m  and  n  —  m  and  that  min(m,  n  —  m)  =  m 
because  m  <n/2. 

Procedure  c(n) 
c  =  0 

If  (n  =  1) ,  then  Return  c 

Let  m  be  n/2  with  remainder  discarded 

c  =  c  +  c(m) 

c  =  c  +  c(n  —  m) 

c  =  c  +  m 

Return  c 

End 

(c)  We  have  c{2°)  =  0  and  c(2'=+i)  =  2c(2*=)  +  2^=  for  fc  >  0.  The  first  few  values  are 

c(2°)  =  0,    c(2^)  =  2°,    c{2^)  =  2  X  2\    c(2^)  =  3  x  2^,    c(2^)  =  4x2^. 

This  may  be  enough  to  suggest  the  pattern  0(2*^)  =  fc  x  2*^"^;  if  not,  you  can  compute  more 
values  until  the  pattern  becomes  clear. 

We  prove  it  by  induction.  The  conjecture  c{2^)  =  kx  2^~^  is  the  induction  assumption. 
For  fc  =  0,  we  have  c(2")  =  0,  and  this  is  what  the  formula  gives.  For  >  0,  we  use  the 
recursion  to  reduce  k  and  then  use  the  induction  assumption: 

c(2'=)  =  2c(2'=-i)  +  2*-i  =  2  X  (fc  -  1)  X  2'=-^  +  2'=-^  =  A;  x  2*=-\ 

which  completes  the  proof. 
When  k  is  large, 

c(2'=)    _      fc  X  2'^-!      _  fc/2 

C(2^)   ~   (fc  -  1)2'=  +  1   ^  fc-l  +  2-fc   ~    '  ■ 

This  shows  that  the  best  case  and  worst  case  differ  by  about  a  factor  of  2,  which  is  not  very 
large. 

7.3.3.  Here  is  code  for  computing  the  number  of  moves. 

Procedure  M(n) 
M  =  0 

If  (n  =  1) ,  then  Return  M 

Let  m  be  n/2  with  remainder  discarded 

M  =  M  +  M(to) 
M  =  M  +  K(,n-m) 
M  =  M  +  n 
Return  M 

End 
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This  gives  us  the  recursion  M(2'=)  =  2M{2''-^)  +  2'=  for  >  0  and  M(2°)  =  0.  The  first  few 
values  are 

M(2°)  =  0,    M{2^)  =  2\    M{2^)  =  2  x  2^,    M{2^)  =  3  x  2^    M(2'*)  =  4x2^. 

Thus  we  guess  M{2^)  =  k2^ ,  which  can  be  proved  by  induction. 

7.3.5  (a)  Here's  one  possible  procedure.  Note  that  the  remainder  must  be  printed  out  after  the 
recursive  call  to  get  the  digits  in  the  proper  order.  Also  note  that  one  must  be  careful  about 
zero:  A  string  of  zeroes  should  be  avoided,  but  a  number  which  is  zero  should  be  printed. 

OUT(m) 

If  m  <  0,  then 

Print  "  -  " 
Set  m  =  — m 
End  if 

Let  q  and  0  <  r  <  9  be  determined  by  m  =  lOg  +  r 
If  g  >  0,  then  OUT(g) 
Print  r 

End 

(b)  Single  digits 

(c)  When  OUT  calls  itself,  it  passes  an  argument  that  is  smaller  in  magnitude  than  the  one  it 
received,  thus  OUT(m)  must  terminate  after  at  most  |to|  calls. 

7.3.7.  The  description  for  fc  =  1  is  on  the  left  and  that  for  /c  >  1  is  on  the  right: 

1  h 

n-  n- 

l    ■■■    n  k,  fc- 1^     •  •  •  n, 

7.3.9  (a)  Let  A{n)  be  the  assertion  "H(n,  S,  E,  G)  takes  the  least  number  of  moves."  Clearly  ^(1) 
is  true  since  only  one  move  is  required.  We  now  prove  A{n).  Note  that  to  do  S-^G  we  must 
first  move  all  the  other  washers  to  pole  E.  They  can  be  stacked  only  one  way  on  pole  E,  so 
moving  the  washers  from  S  to  E  requires  using  a  solution  to  the  Tower  of  Hanoi  problem  for 
n  —  1  washers.  By  A{n  —  1),  this  is  done  in  the  least  number  of  moves  by  H(n  —  1,S,G,E). 
Similarly,  H(n  —  1,E,S,G)  moves  these  washers  to  G  in  the  least  number  of  moves. 

(b)  Simply  replace  H(m, . . .)  with  S{m)  and  replace  a  move  with  a  1  and  adjust  the  code  a  bit  to 
get 

Procedure  S{n) 

If  (n  =  1)     Return  1. 
M  =  0 

M  =  M  +  S{n-1) 
M  =  M+1 
M  =  M  +  S{n-1) 
Return  M 

End 
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The  recursion  is  S{1)  ~  1  and  S{n)  ~  2S{n  —  1)  +  1  when  n  >  1. 

(c)  The  values  are  1,  3,  7,  15,  31,  63,  127. 

(d)  Let  A{n)  be  "-S'(n)  =  2"  —  1."  A{1)  asserts  that  S{1)  =  1,  which  is  true.  By  the  recursion  and 
then  the  induction  hypothesis  we  have 

S{n)  =  25(n-l)  +  l  =  2(2"-^  -  1)  +  1  =  2"  -  1. 

(e)  By  studying  the  binary  form  of  k  and  the  washer  moved  for  small  n  (such  as  n  =  4)  you  could 
discover  the  following  rule. 

If  fc  =  . . .  636261  is  the  binary  representation  of  k,  bj  =  1, 
and  6,  =  0  for  all  i  <  j,  then  washer  j  is  moved. 

(This  simply  says  that  bj  is  the  lowest  nonzero  binary  digit.)  No  proof  was  requested,  but  here's 
one.  Let  A{n)  be  the  claim  for  H(n, . . .).  ^(1)  is  trivial.  We  now  prove  A{n).  If  A;  <  2"~^,  it 
follows  from  S{m)  that  H(n  —  1, . . .)  is  being  called  and  A{n  —  1)  applies.  If  A;  =  2"^^ ,  then  we 
are  executing  S-^G  and  so  this  case  is  verified.  Finally,  if  2"~^  <  k  <  2",  then  H(n  —  1, . . .) 
is  being  executed  at  step  k  —  2"~^,  which  differs  from  k  only  in  the  loss  of  its  leftmost  binary 
bit. 

(f)  Suppose  that  we  arc  looking  at  move  k  =  ■  ■  ■  636261  and  that  washer  j  is  being  moved.  (That 
means  bj  is  the  rightmost  nonzero  bit.)  You  should  be  able  to  see  that  this  is  move  number 
•  •  •  6j+2&j+i  =  {k  —  2^~^)/2^  for  the  washer.  Call  this  number  k' .  To  determine  source  and 
destination,  we  must  study  move  patterns. 

The  pattern  of  moves  for  a  washer  is  either 

Po-.S  ^  G  ^  E  ^  S  ^  G  ^  E  ^  ■■■  repeating  or 
Pi-.S^E^G^S^E^G^--  -  repeating. 

Which  washer  uses  which  pattern?  Consider  washer  j  it  is  easily  verified  that  it  is  moved  a 
total  of  2""-'  times,  after  which  time  it  must  be  at  G.  A  washer  following  Pi  is  at  G  only  after 
move  numbers  of  the  form  3t  +  z  +  1  for  some  t.  Thus  i  +  i  \s  the  remainder  when  2"^-'  is 
divided  by  3.  The  remainder  is  1  if  n  — j  is  even  and  0  otherwise.  Thus  washer  j  follows  pattern 
Pi  where  i  and  n  —  j  have  the  same  parity.  If  we  look  at  the  remainder  after  dividing  k'  by  3, 
we  can  see  what  the  source  and  destination  are  by  looking  at  the  start  of  Pi.  For  those  of  you 
familiar  with  congruences,  the  remainder  is  congruent  to  {—lyk  +  1  modulo  3. 

7.3.11  (a)   We  have 

H*{n,S,E,G) 


H*{n-1,S,E,G)      S-^E    H*{n- 1,G,E,S)      E-^G  H*{n-1,S,E,G) 

(b)  The  initial  condition  is      =  2.  For  n  >  1  we  have  /i*   =  3/i*_i  +  2. 
Alternatively,      =  0  and,  for  n  >  0,  /i*  —  3/i*     +  2. 

(c)  The  general  solution  is  /i*  =  3"  —  1.  To  prove  it,  use  induction.  First,  it  is  correct  for  n  =  0. 
Then,  for  n  >  0, 

hi  =  3/i;_i  +  2  =  3(3"-i-l)  +  2  =  3"-l. 
7.3.13  (a)   We  omit  the  picture. 
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(b)  Induct  on  n.  It  is  true  for  n  =  1.  If  n  >  1,  a2,...,a„  e  G(A;2, . . . ,  fc„)  by  the  induction 
hypothesis.  Thus  ai,  a2, . . . ,  a„  is  in  ai,H  and  ai,  R{H). 

(c)  Induct  on  n.  It  is  true  for  n  =  1.  Suppose  n  >  I  and  let  the  adjacent  leaves  be  6i, . . . ,  6„  and 
ci, . . . ,  c„,  with  c  following  6.  If  bi  =  ci,  then  apply  the  induction  hypothesis  to  G{k2,  ■  ■  ■ ,  k„) 
and  the  sequences  b2,.  ■  ■  ,bn  and  C2, . . . ,  c„.  If  6i  ^  ci,  it  follows  from  the  local  description 
that  ci  =  6i  +  1,  that  &2,  •  •  • ,  is  the  rightmost  leaf  in  H  (or  R{H))  and  that  C2, . . . ,  c„  is  the 
leftmost  leaf  in  R{H)  (or  H,  respectively).  In  either  case,  62, . . . ,  6„  and  C2, . . . ,  c„  are  equal 
because  they  are  the  same  leaf  of  H. 

(d)  Let  Rn{ci)  be  the  rank  of  a\, . . .  ,an-  Clearly  Ri{a)  =  cci  —  1.  If  n  >  1  and  a\  =  1,  then 

Rn{ct)  =  -Rn-i(Q^2,  •  •  ■ ,  cxn)-  If  n  >  1  and  ai  =  2,  then  Rn{a)  =  2"  —  1  —  i?„_i(a2,  • . . ,  Q!„). 
Letting  Xi  =  ai  —  I,  we  have  Rn{ct)  =  (2"  —  l)xi  +  (— l)^ii?„_i(Q:2i  •  •  •  i  o^n)  ^^nd  so 

i?„(a)  =  (2"-l):ci  +  (-l)^H2"-i-l)a;2  +  (-l)^^+^H2"-2-l):r3  +  ---  +  (-ir+^=+-+^''-^a;„. 

(e)  If  you  got  this,  congratulations.  Let  j  be  as  large  as  possible  so  that  ai,. . .  ,aj  contains  an 
even  number  of  2's.  Change  aj.  (Note:  If  j  =  0,  a  =  2, 1, . . . ,  1,  the  sequence  of  highest  rank, 
and  it  has  no  successor.) 


Section  7.4 


7.4.1.  Let  M(n)  be  the  minimum  number  of  multiplications  needed  to  compute  a;'' 
you  to  verify  the  following  table  for  n  <  9 


We  leave  it  to 


n 

M{n) 


9^ 
4 


15 
5 


21 
6 


47 


7 


Since  15  =  3  x  5,  it  follows  that  M(15)  <  M(3)  +  M(5)  ^  5.  Likewise,  M(21)  <  M(3)  +  M(7)  =  6. 
Since  the  binary  form  of  49  is  IIOOOI2,  M(49)  <  7.  Since  47  =  IOIIII2,  we  have  M(47)  <  9,  but 
we  can  do  better.  Using  47  =  2  x  23  +  1,  gives  M(47)  <  M(23)  +  2,  which  we  leave  for  you  to  work 
out.  A  better  approach  is  given  by  47  =  5  x  9  +  2.  Since  x"^  is  computed  on  the  way  to  finding  x^, 
it  is  already  available  and  so  M(49)  <  M(5)  +  M(9)  +  1  =  8.  It  turns  out  that  these  are  minimal, 
but  we  will  not  prove  that. 


7.4.3.  Let  Vn  be  the  transpose  of  (a„ 


M 


,a„+fc_i).  Then  v„ 


0 


0       0  0 

\ak    flfc-i  ak-2 


Mv„ 

0  \ 
0 


1 

ai  / 


where 


7.4.5.  Finding  a  maximum  of  n  items  can  be  done  in  @{n),  so  it's  the  computation  of  all  the 
different  F{v)  values  is  the  problem.  Thus  we  could  compute  the  values  of  F  separately  from  finding 
the  maximum.  However,  since  it's  convenient  to  compute  the  maximum  while  we're  computing  the 
values  of  F,  we'll  do  it. 

The  root  r  of  T  has  two  sons,  say  sl  and  sr.  Observe  that  the  answer  for  the  tree  rooted  at  r 
must  be  either  the  answer  for  the  tree  rooted  at  sl  or  the  answer  for  the  tree  rooted  at  sr  or  F{r). 
Also 

F{r)  =  f{r)  +  F{sL)  +  F{sR). 


Here's  an  algorithm  that  carries  out  this  idea. 
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/*  r  is  the  root  of  the  tree  in  what  follows.  */ 

Procedure  BestSum(r) 

Call  Kecnrir,  Fvalue,best) 
Return  best 

End 

Procedure  Recur  (r,  F,  best) 
If  r  is  a  leaf  then 

F  =  fir) 
best  =  f{r) 

Else 

Let  sl  and  sn  be  the  sons  of  r. 

Recur  is L,FL,bL) 

Recur  is ii,Fii,bii) 

F  =  f{r)  +  Fl  +  Fr 

best  =  max{F,  bj,,  bn) 
End  if 
Return 

End 

Since  /(r)  is  only  used  once,  the  running  time  of  this  algorithm  is  6(n)  for  n  vertices,  an  improvement 
over  0(nlnn). 

7.4.7  (a)   We  can  use  induction:  It  is  easily  verified  for  n  =  1  and  n  =  2.  For  n  >  2  we  have 

dn    =   dn-l  +  an-2    =    {aoFn-2  +  CLiFn-s)  +  (ao-Fn-3  +  (llFn-4} 
=   (lo{Fn-2  +  Fn-3)  +  0'l{Fn-3  +  Fn-4,)    =   «0-Pn-l  +  <ll-Pn-2- 

(b)    Since  the      satisfy  the  same  recursion  as  the  Fibonacci  numbers,  it  is  easily  seen  that  the  a 
sequence  is  just  the  F  sequence  shifted  by  k. 
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Section  8.1 

8.1.1.  This  is  exactly  the  situation  in  the  text,  except  that  there  is  now  one  additional  question 
when  one  reaches  a  leaf. 

8.1.3  (a)  If  T  is  not  a  full  binary  tree,  there  is  some  vertex  v  that  has  only  one  child,  say  s.  Shrink 
the  edge  {v,  s)  so  that  v  and  s  become  one  vertex  and  call  the  new  tree  T'.  If  there  are  k  leaves 
in  the  subtree  whose  root  is  v,  then  TC(T')  =  TC(T)  -  k. 

(b)  We  follow  the  hint.  Let  k  be  the  number  of  leaves  in  the  subtree  rooted  at  v.  Since  T  is  a  binary 
tree  and  v  is  not  a  root,  k  >2  Let  d  =  h{v)  —  h{l2)  and  note  that  d  =  {h{li)  —  1)  —  h{l2)  >  1. 
The  distance  to  the  root  of  every  vertex  in  the  subtree  rooted  at  v  is  decreased  by  d  and  the 
distance  of  I2  to  the  root  is  increased  by  d.  Thus  TC  is  decreased  hy  kd  —  d  >  0. 

(c)  By  the  discussion  in  the  proof  of  the  theorem,  we  know  that  the  height  of  T  must  be  at  least  m 
because  a  binary  tree  of  height  m  —  1  or  less  has  at  most  2"*"^.  Suppose  T  had  height  M  >  m. 
By  the  previous  part  of  this  exercise,  the  leaves  of  T  have  heights  M  and,  perhaps,  M  —  1. 
Thus,  every  vertex  of  v  with  h{v)  <  M  —  1  has  two  children.  It  follows  that  T  has  2^~^  vertices 
w  with  h{w)  =  M  —  1.  If  these  were  all  leaves,  T  would  have  2^~^  >  2™  leaves;  however,  at 
least  one  vertex  u  with  d{u)  =  M  —  1  is  not  a  leaf.  Since  it  has  two  children,  T  has  at  least 
2™  +  1  leaves,  a  contradiction. 

(d)  By  the  previous  two  parts,  all  leaves  of  T  have  height  at  most  to.  If  T'  is  principal  subtree  of 
T,  its  leaves  have  height  at  most  m  —  1  in  T'.  Hence  T'  has  at  most  2™"^  leaves. 

The  argument  hints  at  how  to  construct  the  desired  tree:  Construct  T',  a  principal  subtree 
of  T,  having  all  its  leaves  at  height  to  —  1  in  T'.  Construct  a  binary  tree  T"  having  n  —  2™~^ 
such  that  TC(T")  is  as  small  as  possible.  The  principal  subtrees  of  T  will  be  T'  and  T". 

8.1.5  (a)  Suppose  the  answer  is  Sn-  Clearly  5*0  =  1  since  the  root  is  the  only  vertex.  We  need  a  re- 
cursion for  Sn-  One  approach  is  to  look  at  the  two  principal  subtrees.  Another  is  to  look  at  what 
happens  when  we  add  a  new  "layer"  by  replacing  each  leaf  with  *^ 

For  the  first  approach,  =  1  +  2S'„_i,  where  each  Sn-\  is  due  to  a  principal  sub- 
tree and  the  1  is  due  to  the  root.  The  result  follows  by  induction: 

Sn  =  l  +  25„_i  =  l  +  2(2"-l)  =  2"+i-l. 

For  the  second  approach,  Sn  =  Sn-i  +  2"  and  so  5„  =  (2"  -  1)  2"  =  2"+^  -  1.  By  the  way, 
if  we  have  both  recursions,  we  can  avoid  induction  since  we  can  solve  the  two  equations 

Sn  =  l  +  25„_i       and       5„  =  5„_i  +  2" 

to  obtain  the  formula  for  Sn-  Thus,  by  counting  in  two  ways  (the  two  recursions),  we  don't 
need  to  be  given  the  formula  ahead  of  time  since  we  can  solve  for  it. 

(b)  Let  the  value  be  TC*(n).  Again,  we  use  induction  and  there  are  two  approaches  to  obtaining 
a  recursion.  Clearly  TC*(1)  =  0,  which  agrees  with  the  formula. 

The  first  approach  to  a  recursion:  Since  the  principal  subtrees  of  T  each  store  Sn-i  keys 
and  since  the  path  lengths  all  increase  by  1  when  we  adjoin  the  principal  subtrees  to  a  new 
root,  TC*(n)  =  2(5'„_i  +  TC*(n  -  1)).  Thus 

TC*(n)  =  2(2"- l  +  (n-2)2"  +  2)  =  2((n  -  1)2"  +  1)  =  (n  -  1)2"+^  +  2. 

For  the  second  approach,  TC*(n)  =  TC*(n  —  1)  +  n2".  Again,  we  can  prove  the  formula  for 
TC*(n)  by  induction  or,  as  in  (a),  we  can  solve  the  two  recursions  directly. 
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2:1 


2:1 


21 


12 


3:1 


3:1 


321 


3:2 


231 


3:2 


312 


213 


132 


123 


Figure  S.8.1  The  decision  trees  for  binary  insertion  sorts.  Go  to  tlie  left  at  vertex  i  :  j  if  Ui  <  Sj  and  to 
the  right  otherwise.  (You  may  have  done  the  reverse  and  gotten  the  mirror  images.  That's  fine.) 


8.2.1.  Here  are  the  first  few  and  the  last. 

1.  Start  the  sorted  list  with  9. 

2.  Compare  15  with  9  and  decide  to  place  it  to  the  right  giving  9,  15. 

3.  Compare  6  with  9  to  get  6,  9,  15. 

4.  Compare  12  with  9  and  then  with  15  to  get  6,  9,  12,  15. 

5.  Compare  3  with  9  and  then  with  6  to  get  3,  6,  9,  12,  15. 


16.  We  now  have  the  sorted  list  1,  2,  3,  4,  5,  6,  7,  9,  10,  11,  12,  13,  14,  15,  16.  Compare 
8  with  7,  with  12  with  10  and  then  with  9  to  decide  where  it  belongs. 

8.2.3.  See  Figure  S.8.1.  To  illustrate,  suppose  the  original  list  is  3,1,2.  Thus  mi  =  3,  U2  =  1  and 
Us  =  2. 

•  We  start  by  putting  m  in  the  sorted  list,  so  we  have  si  =  3. 

•  Now  U2  must  be  inserted  into  the  list  Si.  We  compare  U2  with  si,  the  2:1  entry.  Since 
Si  =  3  >  1  =  ^2,  we  go  to  the  left  and  our  sorted  list  is  1, 3  so  now  si  =  1  and  S2  =  3. 

•  Now  U3  must  be  inserted  into  the  list  Si,S2-  Since  we  are  at  3:1,  we  compare  Us  =  2  with 
Si  =  1  and  go  to  the  right.  At  this  point  we  know  that  Us  must  be  inserted  into  the  list  S2-  We 
compare  us  =  2  with  S2  =  3  at  3:2  and  go  to  the  left. 

8.2.5  (a)  Suppose  that  the  alphabet  has  L  letters  and  let  the  ith  letter  (in  order)  be  a^.  Let  Uj  be 
a  word  with  exactly  k  letters.  The  following  algorithm  sorts  Ui , . . . ,  u„  and  returns  the  result 
as      . . .  ^  Xfi- 

BUCKET  ...,Un) 

Copy  ■ui,...,M„  to  Xi,...,Xn- 

/*  t  is  the  position  in  the  word.  */ 

For  t  =  k  to  1 

/*  Make  the  buckets.  */ 


Create  L  empty  ordered  lists. 
For  j  =  1  to  n 

If  the  tth  letter  of  Xj  is  Oj, 

then  place  Xj  at  the  end  of  the  ith  list. 

End  for 


Section  8.2 
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Copy  the  ordered  lists  to  Xi,...,Xn,  starting  with 

the  first  item  in  the  first  list  eind  ending  with  the 
last  item  in  the  Lth  list. 

End  for 

End 


(b)    Extcud  all  words  to  k  letters  by  iiitrocluciiig  a  new  letter  called  "blank"  which  precedes  all 
other  letters  alphabetically.  Apply  the  algorithm  in  (a). 

8.2.7.  First  divide  the  list  into  two  equally  long  tapes,  say 

A:  9,  15,  6,  12,  3,  7,  11  5       B:  14,  1,  10,  4,  2,  13,  16,  8. 

Think  of  each  tape  as  containing  a  series  of  1  long  (sorted)  lists.  (The  commas  don't  appear  on  the 
tapes,  they're  just  there  to  help  you  see  where  the  lists  end.)  Merge  the  first  half  of  each  tape,  list 
by  hst,  to  tape  C  and  the  last  halves  to  D.  This  gives  us  the  following  tapes  containing  a  series  of  2 
long  sorted  lists: 

C:  9  14,  1  15,  6  10,  4  12       D:  2  3,  7  13,  11  16,  5  8. 
Now  we  merge  these  2  long  lists  to  get  4  long  lists,  writing  the  results  on  A  and  B: 

A:  2  3  9  14,  1  7  13  15  B:  6  10  11  16,  4  5  8  12. 
Merging  back  to  C  and  D  gives 

C:  2  3  6  9  10  11  14  16  D:  1  4  5  7  8  12  13  15. 
These  are  merged  to  produce  one  16  long  list  on  A  and  nothing  on  B. 

8.2.9.  A  split  requires  n  —  1  comparisons  since  the  chosen  item  must  be  compared  with  every  other 
item  in  the  list  In  the  worst  case,  we  may  split  an  n  long  list  into  one  of  length  1  and  another  of 

length  n  —  1.  We  then  apply  Quicksort  to  the  list  of  length  n  —  1.  If  W{n)  comparisons  are  needed, 
then  W{1)  =  0  and  W{n)  =  n  -  1  +  w{n  -  1)  for  n  >  1.  Thus  W{n)  =  Y.k'Zl  k  ^  n{n  -  l)/2. 

Suppose  that  n  =  2*^  and  the  lists  axe  split  evenly.  Let  E{k)  be  the  number  of  comparisons. 
Since  Quicksort  is  applied  to  two  lists  of  length  n/2  after  splitting,  E{k)  =  n  —  1  +  2E{k  —  1)  for 
>  0  and  £^(0)  =  0.  A  little  computation  gives  us  £'(1)  =  1,  E{2)  =  5,  E{3)  =  17,  £(4)  =  49  and 
E{5)  —  129.  From  the  statement  of  the  problem  we  expect  E{k)  to  be  near  k2'^,  which  has  the  values 
0,  2,  8,  24  and  64.  Comparing  these  sequences  we  discover  that  E{k)  =  2{k  —  1)2*^"^  +  1  for  A;  <  6. 
This  is  easily  proved  by  induction 
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Section  8.3 

8.3.1.  Since  there  are  only  3  things,  you  cannot  compare  more  than  one  pair  of  things  at  any  time. 
By  the  Theorem  8.1,  we  need  at  least  log2(3!)  comparisons;  i.e.,  at  least  three.  A  network  with  three 

comparisons  that  sorts  is  given  in  Figure  8.2. 

8.3.3.  As  argued  in  the  previous  two  solutions,  we  will  need  at  least  seven  comparisons  and  we  can 
do  at  least  two  per  time.  This  means  it  will  take  at  least  four  time  units.  It  has  been  shown  (but 
not  in  this  text!)  that  at  least  five  time  units  are  required.  A  brick  wall  sort  works. 

8.3.5.  One  possibility  is  the  type  of  network  shown  in  Figure  8.3.  For  n  inputs,  this  has 

l  +  2  +  ...  +  (n-l)  =     ^  ^  ' 

comparators.  It  was  noted  in  the  text  that  a  brick  wall  for  n  items  must  have  length  n.  If  n  is  even 
there  are  (n/2)(n  —  1)  comparators  and  if  n  is  odd  there  are  n((n  —  l)/2)  comparators.  Thus  this  is 
the  same  as  Figure  8.3.  We  don't  know  if  it  can  be  done  with  less. 

8.3.7.  By  the  Adjacent  Comparisons  Theorem,  we  need  only  check  the  sequence  n, ...  ,2, 1.  Using 
the  argument  that  proves  the  Zero-One  Principle,  it  follows  that  this  sequence  is  sorted  if  and  only 
if  all  sequences  that  consist  of  a  string  of  ones  followed  by  a  string  of  zeroes  are  sorted. 

8.3.9.  It  is  evident  that  the  idea  in  the  solution  to  (a)  of  the  previous  exercise  works  for  any  n.  This 
can  be  used  as  the  basis  of  an  inductive  proof. 

An  alternative  proof  can  be  given  using  sequences  that  consist  of  ones  followed  by  zeroes.  (See 
Exercise  8.3.7.)  Note  that  when  the  lowest  1  starts  moving  down  through  the  comparators,  it  moves 
down  one  line  each  time  unit  until  it  reaches  the  bottom.  The  1  immediately  above  it  starts  the  same 
process  one  time  unit  later.  The  1  immediately  above  this  one  starts  one  more  time  unit  later,  and 
so  forth.  If  there  are  j  ones  and  the  lowest  1  reaches  the  bottom  after  an  exchange  at  time  t,  then 
the  next  1  reaches  proper  position  after  an  exchange  at  time  t  +  1.  Continuing  in  this  way,  all  ones 
are  in  proper  position  after  the  exchanges  at  time  t  +  j  —  1.  Suppose  the  jth  1  (i.e.,  lowest  1)  starts 
moving  by  an  exchange  at  time  i.  Since  it  reaches  position  after  n  —  j  exchanges,  t  =  i  +  (n  —  j)  —  1. 
Thus  all  ones  are  in  position  after  the  exchanges  at  time  (i  +  (n  —  j)  —  1)  +  j  —  1  =  n  +  i  —  2. 
The  jth  1  starts  moving  when  it  is  compared  with  the  line  below  it.  This  happens  at  time  1  or  2. 
Thus  n  +  i  —  2  <  n. 

8.3.11.  Use  induction  on  n.  For  n  =  2^,  it  works.  Suppose  that  it  works  for  all  powers  of  2  less  than 
2*.  We  use  the  variation  of  the  Zero-One  Principle  mentioned  in  the  text.  Suppose  that  the  first 
half  of  the  Xi's  contains  a  zeroes  and  the  second  half  contains  (3  zeroes.  EMERGE  calls  BMERGE2  with 
k  =  j  =  2*~^  By  the  induction  assumption,  BMERGE2  rearranges  the  "odd"  sequence  Xi,X3,...,  X2t-i 
in  order  and  the  "even"  sequence  a;2, 2^4,  ■  •  ■ ,  a;2t  in  order.  The  number  of  zeroes  in  the  odd  sequence 
minus  the  number  of  zeroes  in  the  even  sequence  in  0,  1  or  2;  depending  on  how  many  of  a  and  /3 
are  odd.  When  the  difference  is  0  or  1,  the  result  of  BMERGE2  is  sorted.  Otherwise,  the  last  zero  in 
the  odd  sequence,  Xa+p+i,  is  after  the  first  one  in  the  even  sequence,  Xa+i3,  and  all  other  ajj's  are  in 
order.  The  comparator  in  EMERGE  with  i  =  (a  +  /3)/2  fixes  this. 

8.3.13  (a)  Since  a  one  long  list  is  sorted,  nothing  is  done  and  so  ^(O)  =  0.  The  two  recursive  calls 
of  ESORT  can  be  implemented  by  a  network  in  which  they  run  during  the  same  time  interval. 
This  can  then  be  followed  by  the  EMERGE  and  so  S{N)  <  S{N  -  1)  +  M{N). 

(b)  As  for  5(0)  =  0,  M(0)  =  0  is  trivial.  Since  all  the  comparators  mentioned  in  EMERGE  and  be 
run  in  parallel  at  the  same  time,  M{N)  <  M{N  -  1)  +  1. 

(c)  From  the  previous  part,  it  easily  follows  by  inchiction  on  N  that  M{N)  <  N.  Thus 
S{N)  <  S{N  —  1)  +  N  and  the  desired  result  follows  by  induction  on  N. 
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(d)   If  2^-1  <  n  <  2^,  then  the  above  ideas  show  that  S{n)  <  N{N  +  l)/2.  Thus 

S{n)  <  ^  ( 1  +  log2  n)  (2  +  logs  n) . 

Section  9.1 

9.1.1.  We  do  PREV(T). 

PREVCT) 

Let  r  be  the  root  of  T 

Let  Ti,...,Tfe  be  the  principal  subtrees  of  T 
Output  r 

For  i  =  l,...,k  Prev(Ti) 

End 

9.1.3.  We  give  pseudocode  for  vertex  visitation. 

BFV(T) 

Initialize  queue 
INQUEUECT) 

While  queue  not  empty 
S  =  OUTQUEUE( ) 
Let  r  be  the  root  of  S 

Let  Si,...,Sk  be  the  principal  subtrees  of  S 
Output  r 

For  i  =  l,...,k  INQUEUE(S'i) 
End  while 

End 

9.1.5.  The  proof  can  be  done  by  induction  on  the  size  of  the  tree  by  showing  that  the  comments  in 
the  algorithm  are  correct.  To  do  this,  we  need  to  notice  a  couple  of  things. 

•  By  removing  r  from  G  before  constructing  S,  we  guarantee  that  S  will  not  contain  r. 
Thus  it  will  contain  precisely  the  vertices  that  are  reachable  on  a  path  from  r,  starting 

with  the  edge  {r,  s}. 

•  Because  we  remove  the  root  vertex  of  the  tree  from  G  and  do  this  recursively,  whenever 
a  tree  is  ready  to  return,  all  its  vertices  have  been  removed  from  G.  As  a  result,  none  of 

the  vertices  in  S  are  left  in  G  when  we  construct  R. 

9.1.9  (a)    D{T)  is+l,D{Ti),~l,+l,D{T2),-l,...,+l,D{Tm),-l. 

(b)  Each  edge  is  traversed  twice,  proving  the  sum.  The  rest  can  be  proved  by  induction  on  the 
number  of  vertices  using  the  formula  in  (a) .  Actually,  one  can  show  more:  The  sum  up  to  k  is 
the  length  of  the  path  from  the  root  to  the  vertex  that  is  reached  after  k  steps. 

(c)  The  "if"  part  follows  from  (b).  The  "only  if"  part  can  be  done  by  showing  that  there  is  a 
unique  way  to  construct  a  tree  associated  with  such  a  sequence.  This  can  be  done  recursively 
if  we  use  the  observation  that  the  sum  up  to  fc  is  0  if  and  only  if  we  have  returned  to  the  root 
after  k  steps:  Let  k  be  the  first  index  for  which  the  sum  is  0.  We  must  have  .si  =  +1.  Sk  =  —  1 
and  the  subsequences  S2,  -  ■  ■ ,  Sk-i  and  Sk+i,  ■  ■  ■  ,Sn  are  associated  with  unique  RP-trees.  There 
is  just  one  way  to  piece  these  trees  together. 
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Section  9.2 

9.2.1. 

(a)       +        (b)       +         (c)       +  (d)       /  (e)  + 

+     5  +5  1+  +  -  -  * 

+     4         +  +  2+  X*XY*3X  + 

+     3  1234  3+  5     Y  X    Y  XI 

12  4  5 

9.2.3.  We  use  value(-  •  •)  to  indicate  the  value  of  a  variable  or  constant, 
(a)  The  first  method: 

EVALUATE (exp) 

If  (^exp  has  no  op)      Return  value  iexp) . 
If  iexp  =  —expl)      Return  — value(expl) . 
Let  exp  =  (expl  op  ;  exp2) . 
Return  EVALUATE (ea:pl)  op  EVALUATE (ea;p2)  . 

End 


(b)  The  second  method: 

EVALUATE (T) 

Let  r  be  the  root  of  T. 

Let  k  be  the  number  of  principal  subtrees  of  T 

and  let  Ti  be  the  ith  of  them. 
If  (fc  =  0)  ,  Return  value (r). 
For  i  =  1, . . . ,  A;      Let  Vi  =  EVALUATECT,) . 
/*  If  fc  =  1 ,  r  should  be  unary  minus .  */ 
If  fc  =  1 ,  Return  r  Vi . 
If  fc  =  2,  Return  vi  r  V2- 

End 


9.2.5.  We  will  indicate  what  needs  to  be  added.  Other  solutions  are  possible. 

(a)  exp  — >    —  teTTn 

(b)  term  — >    power      and       power         factor    \    factor  **  power 

(c)  Let  subst  be  the  start  symbol  now  and  add    subst          exp    \    id  :=  exp 

(d)  This  is  a  bit  trickier  because  the  :=  must  reach  as  far  to  the  right  as  possible.  In  particular, 
you  cannot  replace  the  last  three  items  in  the  following  list  with  just  factor  — >  subst.  Let 
start  be  the  start  symbol. 

start  exp    \  subst 

subst         id  :=  exp    |    id  :=  subst 

exp  exp+subst    \     exp— subst 

term  term*subst    \    term/  subst 

factor  — >    (  subst ) 
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Section  9.3 

9.3.1.  The  construction  starts  with  •.  The  first  iteration  gives 
produced  in  the  starting  step.  Thus  we  get 


and  all  trees  that  have  children 


•  •  • 


In  the  next  iteration,  we  obtain  the  following  new  trees  with  at  most  4  vertices. 


In  the  next  step,  the  only  new  tree  is  a  4-vertex  tree  consisting  of  a  path  from  the  root  to  a  single 
leaf.  After  this,  no  new  trees  with  less  than  5  vertices  are  obtained. 

9.3.3.  For  k<7,  the  values  are  in  the  text,  bs  =  429,  bg  =  1430  and  bw  =  4862. 

9.3.5.  We'll  use  n:  r  to  mean  a  tree  with  n  leaves  and  rank  r  and  (ni:  ri,  n2'.  ^2)  to  mean  a  tree  with 
left  son  rii:  ri  and  right  son  r2-  We  use  formula  (9.5)  and  the  greedy  approach:  First  make  |ri|  as 
large  as  possible,  then  make  RANK(Ti)  as  large  as  possible.  Here  are  the  calculations.  You  should 
be  able  to  construct  the  trees  easily  from  the  results  as  long  as  you  remember  (a)  that  n:  0  describes 
a  tree  in  which  all  the  left  sons  are  leaves  (since  that  is  the  leftmost  tree  in  the  list  of  trees)  and 
(b)  that  since  there  is  only  one  tree  with  1  leaf  and  only  one  with  2  leaves,  they  each  have  rank  0. 

8: 100  =  (1: 0,  7: 100)    since  6167  =  132  >  100,  we  have  \Ti  \  =  1  and  100  =  O67  +  100 

since  bibe-\  h  6562  =  90  and  10  =  Wbi  +  0 

since  6165  =  14  >  10  and  10  =  O65  +  10 
since  6164  H  +  6362  =  9  and  1  =  I61  +  0 


8: 100  = 

(1:0,  7:100) 

7: 100  = 

(6:10,  1:0) 

6:10  = 

(1:0,  5:10) 

5:10  = 

(4:1,  1:0) 

4:1  = 

(1:0,  3:1) 

3:1  = 

(2:0,  1:0) 

3:200  =  (3: 
5:  12  =  (4: 
4:3  =  (3 


1,  5 
3,  1 
0,  1 


12)  since  6167  +  62^6  =  174  and  26  =  I65  +  12 
0)  since  &164  +  62^3  +  ^3^2  —  9 

0)  since  61 63  +  62^*2  =  3 


for  n  >  5  since  6i6„_i  >  3 


8:300  =  (7:3,  1:0) 
n:3  =  (1:0,  n  -  1:  3) 
4:3  =  (3:0,  1:0) 

8:400=  (7:103,  1:0)    7: 103  =  (6: 13,  1:  0)    6: 13  =  (1: 0,  5: 13) 
5:13=  (4:4,  1:0)    4:4=  (3:1,  1:0) 
9.3.7  (a)    We  omit  the  pictures. 

(b)  In  the  notation  introduced  in  Exercise  9.3.5,  with  n  =  2m  +  1  and  k  =  bn/2,  we  claim 
that  A4n  =  n:k  =  (m  +  1:0,  m:0).  To  prove  this,  note  that  the  rank  of  this  tree  is 
bib2m  H  1-  bmbm+i  and  that 

bn    =   bib2m-\  \-b2mbl    =    2(6i62mH  h^mWl)- 

(c)  If  n  =  2m,  then  6„  =  2(6i62m-i  +  •  •  •  +  6^-16^+1)  +  b"^,  which  is  divisible  by  2  if  and  only 
if  bm  is.  Thus  there  is  no  such  tree  unless  bm  is  even.  In  this  case  you  should  be  able  to  show 
that      2m  =  (™-^rn/2,  m:0). 
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9.3.9.  Here  is  one  way  to  define  an  equivalence  relation  =  by  induction  on  the  number  of  vertices. 

Let  Tn  be  the  set  of  labeled  n-vcrtcx  RP-trees.  Define  all  the  trees  in  Ti  to  be  equivalent.  If  n  >  1, 
suppose  T,  T'  e  Tn-  Let  T  be  built  from  Ti,. . .  ,Tk  and  T'  from  T[,. . . ,  T^'  according  to  the  recursive 
construction  in  Example  7.9  (p.  206).  We  define  T  =  T'  if  and  only    =  £  and  Tj  =  T/  for  1  <  i  <  k. 

9.3.11  (a)  An  Xi  belongs  in  a  parenthesis  pair  that  has  nothing  inside  it.  Number  the  empty  pairs 
from  left  to  right  and  insert  Xi  into  the  ith  pair. 

(b)  This  is  just  a  translation  of  what  has  been  said.  If  you  are  confused,  remember  that  B(n)  should 
is  to  be  thought  of  as  all  possible  parentheses  patterns  for  xi, . . . ,  Xn- 

(c)  This  simply  involves  the  replacement  described  in  (b):  Make  •  correspond  to  (  )  and  make 
the  tree  with  sons  Ti  and  T2  correspond  to  (P1P2),  where  P,  is  the  parentheses  pattern  corre- 
sponding to  Tj. 

9.3.13  (a)  The  leaves  in  an  RP-tree  are  distinguishable  because  the  tree  is  ordered.  Thus,  each 
marking  of  the  n  leaves  leads  to  a  different  situation.  The  same  comments  applies  to  vertices 
and  there  are  2n  —  1  vertices  by  Exercise  9.3.6. 

(b)  Mark  the  single  vertex  that  arises  in  this  way  to  obtain  an  element  of  Vn  Interchanging  x  and 
the  tree  rooted  at  b  gives  a  different  element  of         that  gives  rise  to  the  same  element  of 

Conversely,  given  any  element  of  V„,  the  marked  vertex  should  be  split  into  two,  /  and 
b  with  b  a  son  of  /.  Introduce  another  son  x  oi  f  which  is  a  marked  leaf.  There  are  two 

possibilities-make  /  a  left  son  or  a  right  son. 

(c)  By  (a),  \Cn\  =  nbr,  and  |V„|  =  (2n  -         By  (b),  |£„+i|  =  2|V„|. 

(d)  By  the  recursion, 

,         2(2n-3),  2(2n-3)  2(2n-5),  2"-i(2n  -  3)(2n  -  5)  •  •  •  1 , 

On    =   On-l    =  bn-2    =    ■■■    =  —  Oi. 

n  n         n  —  1  n{n  —  1)  •  •  •  2 

Using  61  =  1,  we  have  a  simple  formula;  however,  it  can  be  written  more  compactly: 
2"-i(2n-3)(2n-5)---l  _  2"-i(n  -  1)!  (2n  -  3)(2n  -  5)  •  •  •  1 

"n    —  j 

n! 

(2n-2)!    _   1  /2n-2\ 
(n  —  1)!  n!        n\n  —  1  J 


{n-iy.  n\ 


Section  10.1 

10.1.1.  The  p  and  q  calculations  can  be  done  by  multiplication.  If  so,  and  we  are  asked  for  the 
coefficient  of  x^,  say,  then  we  can  ignore  any  power  of  x  greater  than  x^  that  appear  in  intermediate 
steps. 

(a)   Letting  =  mean  equality  of  coefficients  of  a;"  for  n  <  3,  we  have 
=  {1  +  x  +  x^  +  x^f  =  1  +  2x  +  3a;^  + 

=  (l  +  x  +  a;^  +  a;^)(l  +  2a;  +  3a;^  +  4a;^)  =  1  +  3a;  +  6a;^  +  lOa;^ 
/  =  (l  +  x  +  a;^  +  a;^)(l  +  3a;  +  6a;^  +  10a;^)  =  1  +  4a;^  +  lOa;^  +  200;"^. 


(b)    By  the  opening  remarks  in  the  solution,  this  will  be  the  same  as  (a). 
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(c)  We  can  do  this  by  writing  r  =  1  +  a;  +  a;^  +  a;^  or  we  can  write,  for  example,  f{x)  =  {1  —  x)  ^ 
and  use  Taylor's  Theorem  to  compute  the  coefficients. 

(d)  Note  that  r  =  I  +  x  +  x'^  +  +  ■  ■  ■.  Whenever  you  add,  subtract  or  multiply  power  series  and 
look  for  the  coefficient  of  some  power  of  x,  say  x",  only  those  powers  of  x  that  do  not  exceed 
n  in  the  original  series  matter.  Each  of  p,  q  and  r  begin  1  +  x  +  x"^  +  x^. 

10.1.3.  We  have 


{x^  +  x^  +  x^  +  x' +  x'^f  =  x'%l  +  x  +  x^  +  x^  +  xy  =  rcis 


1  —  a 


1 


The  coefficient  of  x'^^  in  this  is  the  coefficient  of  x^  in  the  eighth  power  on  the  right  hand  side.  Since 
(1  — a;''')*  =  1  — 8a;^  +  -  •  •,  this  is  simply  the  coefficient  of  .t"''  in  (1  — a;)~^  minus  8  times  the  coefficient 
of  a:°  (the  constant  term)  in  (1  —  x)^^.  Thus  our  answer  is 

(5) -«  =  ™^ 

10.1.5.  We'll  do  just  the  general  k. 

(a)  We  have  x''A{x)  =  J2m>o  o-m.x™'^^  =  Y^n>k  o-n-kx"'-  Thus  the  coefficient  of  a;"  is  0  for  n  <  fc 
and  ttn-k  for  n>  k. 

(b)  We  have 

—  J  A{x)  =  l^aml^)   a;'"  =   ^  a„(m)(m  -  1)  •  •  •  (m  -    +  l)a;"-^ 

^  m=0         ^      ^  m.=k 

Set  n  =  TO  —  fc  to  obtain  the  answer:  a„+/s(n  +  k){n  +  k  —  1)  ■  ■  ■  {n  +  1)  =  an+k  ^"nf^'- 

(c)  Since  (a;^)  A{x)  =  X^^=o  fnamx"^-,  repeating  the  operation  k  times  leads  to  Y^'^=o  m'^amX™'. 

Thus  the  answer  is  n'^Un- 

10.1.7.  This  is  simply  the  derivation  of  (10.4)  with  r  used  instead  of  1/3.  The  generating  function 
for  the  sum  is  S{x)  =  1/(1  —  ^(1  +  a;))  and  the  coefficient  of  x''  is 

fe+i 


(r/(l-r))     _   (r/(l-r))        _  ^k 


1  —  r  r  (1  —  r)^+'^ 

To  verify  convergence,  let  o„  =  (")r'"  and  note  that 

«ra+l 


lim 

n— ^00 


lim  ^  T~^^  —  F    <  1- 

n— >oo  n  —  fc  +  1 


10.1.9.  This  is  very  similar  to  the  Exercise  10.1.8  With  Uj  =  (-1)^  (")  and  bj  =  ("),  we  can  apply 
the  convolution  formula.  The  result  is  C{x)  =  (1  —  a;)'"(l  +  a;)™  =  (1  —  a;^)™.  By  the  binomial 
theorem,  (1  —  a;^)™  =  X^(— 1)-' ('^)a;^-'.  Thus,  the  sum  we  are  to  simplify  is  zero  if  k  is  odd  and 

(-1)^  (7)  \ik  =  2j. 

10.1.11.  The  essential  fact  is  that  ^^^^q  '^'''^  '^^     if  r  is  multiple  of  k  and  0  otherwise. 

10.1.13.  This  is  multisection  with  fc  =  3  and  j  =  0, 2, 1,  respectively.  The  basic  facts  that  are  needed 
are  e'^  =  cos  ^  +  i  sin  ^  and  the  sine  and  cosine  of  various  angles  in  the  30°-60°-90°  right  triangle. 
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Section  10.2 

10.2.1  (a)   Let  a„  =  5a„_i  —  6a„_2  +  bn  where  61  =  1  and  6„  =  0  for  n  7^  1.  Then 

00 

A{x)  =  y^^{5xak-ix''~'^  -  6x'^ak-2x''~'^)  +  x  =  5xA{x)  -  6x'^A{x)  +  x. 


fe=o 

Thus 


A{x) 


1  -  5a;  +  6a;2      1  -  3a;     1  -  2x 
and  a„  =  3"  -  2". 

(b)  To  correct  the  recursion,  add  c„+i  to  the  right  side,  where  Cq  =  1  and  c„  =  0  for  n  ^  0. 
Multiply  both  sides  by  a;"+^  and  sum  to  obtain  A{x)  =  xA{x)  +  6a;^A(a;)  +  1.  With  some 
algebra, 


1  -  a;  -  6a;2        1  -  3a;     1  +  2a; 

and  so  a„  =  (3"+^  -  (-2)"+i)/5. 

(c)   To  correct  the  recursion,  add  6„  where  61  =  1  and  6„  =  0  otherwise.  Thus 
A{x)  =  xA{x)  +  x'^A{x)  +  2a;^A(a;)  +  x  and  so 

A{x) 


l-a;-a;2-2x3        (1  -  2a;)(l  +  a;  +  a;2) ' 

By  the  quadratic  formula,  wc  can  factor  l  +  x  +  a;^  as  (1—  ujx){\  —  ujx),  where 
io  =  (— 1  +  •\/^)/2  and  uj  is  the  complex  conjugate  of  w.  Using  partial  fractions, 

A(x)  =     2/7  (3-2^/=3)/21     {^  +  2^)/2l 

1  —  2a;  1  —  UJX                1  —  tox 

and  so 

2"+i  (3-2x/=3)a;"     (3  +  2\/=3)a;" 


7  21  21 

The  last  two  terms  arc  messy,  but  they  can  be  simplified  considerably  by  noting  that  a;^  =  1 
and  so  they  are  periodic  with  period  3.  Thus 

I  (—2/7)  if  n/3  has  remainder  0; 
3/7  if  ri/3  has  remainder  1; 
(—1/7)    if  n/3  has  remainder  2. 

(d)    The  recursion  holds  for  n  =  0  as  well.  From  the  recursion,  A{x)  =  2a;A(a;)  +  ^nx".  By 
Exercise  10.1.5,  the  sum  is  x-^  X^a;",  which  is  a;/(l  —  a;)^.  Thus 

^  oil 

A{x)  = 


(1 -a;)2(l -2a;)         1  -  2a;      1  -  a;      (1  -  a;)2  ' 

After  some  algebra  with  these,  we  obtain  a„  =  2"+^  —  n  —  2. 

10.2.3.  Start  with  a  string  of  n  —  i  zeroes.  Choose  without  repetition  i  of  the  n  +  1  —  i  positions 
(before  all  the  zeroes  or  after  any  zero)  and  insert  a  one  in  each  position  chosen.  The  result  is  an 
n  long  string  with  i  ones,  none  of  them  adjacicnt.  The  process  is  reversible:  The  position  of  a  one  is 
the  number  of  zeroes  preceding  it.  The  formula  for  F„  follows  immediately. 
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10.2.5  (a)    Replacing  A,  B  and  C  with  their  definitions  and  rearranging  leads  to 


+  LiiJa^c™  +  LaiJiX™  +  H1H2X 


{Lr  +  Hrx"'){L2  +  H2x"'). 


(b)  The  number  of  multiplications  required  by  any  procedure  is  an  upper  bound  on  M(2m).  There 

arc  three  products  of  polynomials  of  degree  m  or  less  in  our  "less  direct"  procedure.  If  they 
are  done  as  efficiently  as  possible,  we  will  have  M(2m)  <  3M(m). 

(c)  Let  Sk  =  M(2'°).  We  have  sq  =  1  and  Sk  <  3sfe_i  for  fc  >  0.  If  we  set  to  =  I  and  tk  =  3tk-i 
for  fc  >  0,  then  Sk  <  tk.  The  recursion  gives  ^(a;)  =  ZxT{x)  +  1  and  so  tk  =  3*^.  Thus,  with 
n  =  2'=,  M(n)  <  3*  =  (2^°S2  3)fe  =  ^log^  3  pj.om  tables  or  a  calculator,  logs  3  =  1-58  •  •  •. 

(d)  To  begin  with,  Li(x)  =  1  +  2x,  Hi{x)  =  -\  +  3.t,  L2{x)  =  5  +  2a;  and  H2{x)  =  -x.  The 
product  L1L2  =  (1  +  2a;)  (5  +  2a;)  is  computed  using  the  algorithm.  The  values  are 

m=l,    A  =  (2)(2)  =  4,    B  =  (1)(5)  =  5    and    C  =  (1  +  2)(5  +  2)  =  21. 

Thus  LiL2  =  5  +  12a;  +  4a;^.  In  a  similar  way,  the  products  (—1  +  3.t)(— a;)  =  x  —  3a;^  and 
(5a;)  (5  +  a;)  =  25a;  +  5x^  are  computed  these  are  combined  to  give  the  final  result: 

(5  +  12a;  +  Ax^)  +  (x  -  3a;^)a;^  +  ((25a;  +  hx^)  -  (5  +  12a;  +  4a;^)  -  {x  -  Zx^))x^, 

which  is  5  +  12a;  -  a;^  +  12a;^  +  4a;^  +  x^  -  3a;^. 

(e)  We'll  just  look  at  the  case  in  which  n  =  2m  =  2^ .  Let  be  the  number  of  additions  and 
subtractions  needed.  We  have  ag  =  0  and,  for  k  >  0,  cik  equals  3a/j_i  plus  the  number  of  ad- 
ditions and  subtractions  needed  to  prepare  for  and  use  the  three  multiplications.  Preparation 
requires  two  additions  of  polynomials  of  degree  m  —  1.  The  results  are  three  polynomials  of 
degree  2m  —  2.  We  must  perform  two  subtractions  of  such  polynomials.  Finally,  the  multipli- 
cation by  x"^  and  a;^"'  arranges  things  so  that  there  is  some  overlap  among  the  coefficients. 
In  fact,  there  will  be  2m  —  2  additions  required  because  of  these  overlaps  (unless  some  coeffi- 
cients happen  to  turn  out  zero).  Since  a  polynomial  of  degree  d  has  d+1  coefficients,  there  are 
a  total  of 


Thus  afc  =  3afc_i-|-4x 2*^-4  and  so  A(a;)  =  3a;A(a;)+4X)fe>o(2a:)''-4X]/c>o^''- Co'^sequently, 
A{x)  =  4a;/(l  -  x)(l  -  2a;)(l  -  3.t)  and  at  =  2  x  3''+^  -  2''+^  +  2.  Comparing  this  with 
the  multiplication  result,  we  see  that  we  need  about  three  times  as  many  additions  and/or 
subtractions  as  we  do  multiplications,  which  is  still  much  smaller  than      for  large  n. 

10.2.7.  When  we  use  the  initial  conditions  and  solve  for  A{x)  we  get  A{x)  —  R{x)  +  where 

R{x)  is  some  polynomial,  D{x)  =  1  —  cia;  —  Ckx''  and  N(x)  is  a  polynomial  of  degree  less  than 

k.  By  the  Fundamental  Theorem  of  Algebra,  we  can  factor  D{x)  as  given  in  the  exercise.  By  the 
theory  of  partial  fractions,  there  are  constants  bij  such  that 


2(m-l  +  l)-h2(2m-2  +  l)-h(2m-2)  =  4n  -  4. 


Equating  coefficients  an  assuming  n  is  larger  than  the  degree  of  R{x),  we  have 
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Since  ("^_/)  is  a  polynomial  in  n  of  degree  j  —  1,  it  follows  that  Yl'j=i  ("j^i  ^)  ^  polynomial 
in  n  of  degree  at  most  rf,  —  1. 

Let  d  be  the  degree  of  R{x),  where  the  degree  of  0  is  — oo.  If  I  is  the  largest  vahie  of  n  for  which 
an  initial  value  of  n  must  be  specified,  then  d  <  t  —  k.  To  find  the  coefficients  of  the  polynomials 
Pi(n),  it  suffices  to  know  the  values  oi  at,  ■  ■ .  ,at^.  for  any  t  larger  than  the  degree  of  R{x). 


Section  10.3 

10.3.1  (a)    (1  -  x)D'  -D  =  -e-^  =  -(1  -  x)D  and  so  (1  -  x)D'  -  xD  =  Q. 
(b)    The  coefficient  of      on  the  left  of  our  equation  in  (a)  is 

n\        {n-iy.  (n-1)!" 
The  initial  conditions  are  Dq  =  1  and  Di  =  0. 
10.3.3  (a)  We  are  asked  to  solve  Q'{x)  —  2(1  —  x)~^Q{x)  =  2x{l  —  x)~^.  The  integrating  factor  is 

exp(^j  -2{1-  x)-'^dx^  =  exp(21n(l-a;))  =  (1  -  a;)^. 


Thus 


Q{x){l-xf  =  J  ^^dx  =  J  (^-2+:^-^^dx  =  -2x  -  2ln{l  -  x)  +  C. 


(b)   We  have  -21n(l  -x)-2x  =  Y.k>2  '^^''/^  and 


k>0  ^      '  k>0 


By  the  formula  for  the  coefficients  in  a  product  of  generating  functions, 

E"  2(n— „ 
_  fc   =  2(n+l)X:^-E2 

— 2  k — 2  k — 2 

n  ^  \ 

=  2(n+l)^--2(n+l)-2(n-l)  =  2(n  +  1)  ^  -  -  4n. 
fe=i  fe=i 
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Section  10.4 

10.4.1  (a)   This  is  nothing  more  than  a  special  case  of  the  Rule  of  Product — at  each  time  we  can 
choose  anjrthing  from  T. 

(b)  Simply  sum  the  previous  result  on  k. 

(c)  The  hint  tells  how  to  do  it.  All  that  is  left  is  algebra. 

(d)  The  solution  is  like  that  in  the  previous  part,  except  that  we  start  with 

TeT  ^i=o  ^  TeT 

10.4.3  (a)    This  is  simply  2*  {0,  12*}*.  Thus  the  generating  function  is 

A{x)  = 


1-x  ^     (  \    \        l-3a;  +  a;2" 

1  —    X  +  X- 


Multiply  both  sides  by  1  —  3x  +     and  equate  coefficients  of  x"  to  obtain  the  recursion 

o-n  =  3a„_i  —  a„_2    for    n  >  1 
with  initial  conditions  oq  =  1  and  a\  =  3. 
(b)   You  should  be  able  to  see  that  this  is  described  by  0*(11*0'=0*)*.  Since 

_        1  1      _  x'^+i 


1  —  X     1  —  X         (1  —  x)2 ' 
the  generating  function  we  want  is 

^,  ,  1  1  1-x 


1-Xl-X'=  +  V(l-X)2  1  -  2X  +  .X2  -  x'^+l  ' 

Clearing  of  fractions  and  equating  coefficients,  we  obtain  the  recursion 

a„  =  2a„_i  -  a„_2  +  oin-fe-i    for    n  >  1, 
with  the  understanding  that  aj  =  0  for  j  <  0.  The  initial  conditions  are  ao  =  cti  =  1- 
(c)   A  possible  formulation  is 

0*  (1(11)*00*)*  {A,  1(11)*}. 

This  says,  start  with  any  number  of  zeroes,  then  append  any  number  of  copies  of  the  patterns 
of  type  Z  (described  soon)  and  then  follow  by  either  nothing  or  an  odd  number  of  ones.  A 
pattern  of  type  Z  is  an  odd  number  of  ones  followed  by  one  or  more  zeroes.  The  translation 

to  a  generating  function  gives 

1        /  1       \  ,  ^    /    X  1  1 

1  +  x-  2       where    Gz(x)  =  x-  :zx-  . 


1-x  Gz(x) 

After  some  algebra,  the  generating  function  reduces  to 

,  l  +  x-x^ 

A{x)  = 


1  —  X  —  2x2  +  x^ ' 


which  gives  a„  =  a„_i  +  2a„_2  —  fln-s  for  n  >  2,  with  initial  conditions  oq  =  1,  ai  =  2  and 
a2  =  3. 
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10.4.5.  Here's  a  way  to  construct  a  pile  of  height  h.  Look  at  the  number  of  blocks  in  each  column. 

The  numbers  increase  to  h,  possibly  stay  at  h  for  some  time,  and  then  fall  off.  The  numbers  up  to 
but  not  including  the  first  h  form  a  partition  of  a  number  with  largest  part  at  most  h  —  1  and  the 
numbers  after  the  first  h  form  a  partition  of  a  number  with  largest  part  at  most  h.  The  structures 
are  these  partitions.  By  the  Rule  of  Product  and  Exercise  10.4.4 


n>0  z=l  1=1  /-.  h\ 


(l-a;'^)n(l-^T 


i=l 


Summing  this  over  all  /i  >  0  and  adding  1  gives  ^  s„a:;".  No  simple  formula  is  known  for  the  sum. 

10.4.7  (a)  We  can  build  the  trees  by  taking  a  root  and  joining  to  it  zero,  one  or  two  binary  RP-trees. 
This  gives  us  T{x)  =  x{l  +  T{x)  +  T{x)^). 

(b)  There  is  no  simple  expansion  for  the  square  root;  however,  various  things  can  be  done.  One 
possibility  is  to  use  \/l  —  2x  —  Sx^  =  y/l  —  3a;  y/1  +  x.  You  can  then  expand  each  square  root 
and  multiply  the  generating  functions  together.  This  leads  to  a  summation  of  about  n  terms 
for  pn-  The  terms  alternate  in  sign.  A  better  approach  is  to  write 

ft    ^       ^  k,j  ^       ^  ^*^' 

This  leads  to  a  summation  of  about  n/2  positive  terms  for  p„.  It's  also  possible  to  get  a  recursion 
by  constructing  a  first  order,  linear  differential  equation  with  polynomial  coefficients  for  T{x) 
as  done  in  Exercise  10.2.6.  Since  the  recursion  contains  only  two  terms,  it's  the  best  approach 
if  we  want  to  compute  a  table  of  values.  It's  also  the  easiest  to  program  on  a  computer. 

10.4.9.  The  key  to  working  this  problem  is  to  never  allow  the  root  to  have  exactly  one  son. 

(a)  Let  the  number  be  r„.  The  generating  function  for  those  trees  whose  root  has  degree  k  is  R{x)^ . 
Since  Y^kyo^i^)^  =  1/(1  "  we  have  R{x) 

—     1— A(a:)  ~  xR{x).  Clearing  of  fractions 

and  solving  the  quadratic, 

,       1  +  a;  -  Vl  -  2a;  -  3x2 

=  — W^) — • 

(The  minus  sign  is  the  correct  choice  for  the  sqiiare  root  because  i?(0)  =  ro  =  0.)  These 
numbers  are  closely  related  to  p„  in  Exercise  10.4.7.  By  comparing  the  equations  for  the 
generating  functions, 

(l  +  a;)i?(a;)  =  a;(P(a;)  +  1) 
and  so  r„  +  r„_i  =  p„_i  when  n  >  1. 

(b)  We  modify  the  previous  idea  to  count  by  leaves: 


R{x)  =  x  +  ^R{x) 

k>2 


Solving  the  quadratic: 

1  +  a;  —  \/l  —  6a;  +  a;^ 
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(c)    From  (a)  we  have  2(1  +  x)R  —  1  —  x  =  — \/l  —  2x  —  3a;^  and  so 

2xR'  +  2R-l  =     ^-^"^  . 

VI  -  2x  -  3x2 

Thus  (1  -  2x  -  3x2)(2a;7?'  +  E  -  1)  =  -2(1  +  x)R+l  +  x.  Equating  coefficients  of  a;"  gives  us 

(2n  +  l)r„  -  (4n  -  2)r„_i  -  (6n  -  9)r„_2  =  -2r„  -  2r„_i    for  n  >  3. 

Rearranging  and  checking  initial  conditions  we  have 

4nr„  +  3(2n  -  l)r„_i 

r-n+i  =   ^ —   for  n  >  2, 

2n  +  5 

with  ro  =  r2  =  0  and  ri  =  1.  You  should  be  able  to  treat  (b)  in  a  similar  manner.  The  result 
is  ro  =  0,  ri  =  r2  =  1  and,  for  n  >  2, 

3(2n  -  l)r„  -  (n  -  2)r„_i 

Tn+l  =   —  • 

n  +  1 

10.4.11  (a)   A  tree  of  outdegree  D,  consists  of  a  root  and  some  number  d  G  D  of  trees  of  outdegree 
D  joined  to  the  root.  Use  the  Rules  of  Sum  and  Product. 

(b)  Let  fo{x)  =  1.  Define  fn+i{x)  =  xJ^deD  fni^Y-  We  leave  it  to  you  to  prove  by  induction 
that  fn{x)  agrees  with  Tn{x)  through  terms  of  degree  n. 

(c)  Except  for  the  1  G  D  question,  this  is  handled  as  we  did  T£,{x).  Why  must  we  have  1  ^  Dl  You 
should  be  able  to  see  that  there  are  an  infinite  number  of  trees  with  exactly  one  leaf — construct 
a  tree  that  is  just  a  path  of  length  n  from  the  root  to  the  leaf. 

10.4.13.  We  can  build  these  trees  up  the  way  we  built  the  full  binary  RP-trees:  join  two  trees  at  a 
root.  If  we  distinguish  right  and  left  sons,  every  case  will  be  counted  twice,  except  when  the  two  sons 
are  the  same.  Thus  B{x)  =  x  +  |(i3(a;)^  —  E{x))  +  E{x),  where  E{x)  counts  the  situation  where 
both  sons  are  the  same  and  nonempty.  We  get  this  by  choosing  a  son  and  then  dupUcating  it.  Thus 
each  leaf  in  the  son  is  replaced  by  two  leaves  and  so  E{x)  =  B{x^). 

10.4.15  (a)   Either  the  list  consists  of  repeats  of  just  one  item  OR  it  consists  of  a  list  of  the  proper 

form  AND  a  list  of  repeats  of  one  item.  In  the  first  case  we  can  choose  the  item  in  s  ways  and 
use  it  any  number  of  times  from  1  to  k.  In  the  second  case,  we  can  choose  the  final  repeating 
item  in  only  s  —  1  ways  since  it  must  differ  from  the  item  preceding  it. 

(b)   After  a  bit  of  algebra, 

s/(s-l)  s  s(l-a;)/(s- 1)  s 


Akix) 


1  -  {s  -  l){x  +  x"^ -\  ha;*=)     s-1        1  -  sx  +  {s  -  l)x''+^     s  -  1' 


(c)  Multiplying  both  sides  of  the  formula  just  obtained  for  Ak{x)  by  l  —  sx+{s  —  l)a;'^+^  gives  the 
desired  result. 

(d)  Call  a  sequence  of  the  desired  sort  acceptable.  Add  anything  to  the  end  of  an  n-long  acceptable 
sequence.  This  gives  sun.k  sequences.  Each  of  these  is  either  an  acceptable  sequence  of  length 
n  +  1  or  an  (n  —  /c)-long  acceptable  sequence  followed  by  +  1  copies  of  something  different 
from  the  last  entry  in  the  (n  —  A;)-long  sequence. 
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10.4.17.  We  have  l-3x  +  x'^  =  (1  -  ax){l  -  bx)  where  a  =  and  6  =  ^=^.  Thus 

X          _  l/{a-b)  1/ia-b) 


1  —  3x  +  x"^         1  —  ax         1  —  bx 


and  so,  since  a  —  b  =  \/5, 


rr, 


a"  -  6" 


10.4.19  (a)  The  accepting  states  are  unchanged  except  that  if  the  old  start  state  was  accepting, 
both  the  old  and  new  start  states  are  accepting.  If  there  was  an  edge  from  the  old  start  state 
to  state  t  labeled  with  input  i,  then  add  an  edge  from  the  now  start  state  to  t  labeled  with  i. 
(The  old  edge  is  not  removed.)  We  can  express  this  in  terms  of  the  map  /  :  S'  x  /  ^  2'^  for 
the  nondeterministic  automaton.  Let  Sq  &  S  he  the  old  start  state  and  introduce  a  new  start 
state  s„.  Let  T  =  5  U  {s„}  and  define  /*  :  T  x  /  ^  2^  by 


f*{t,i) 


f{t,i),  iftGS, 
f{so,i),    ifi  =  s„. 


(b)  Label  the  states  of  A  and  B  so  that  they  have  no  labels  in  common.  Call  their  start  states  sa 
and  sb-  Add  a  new  start  state  s„  that  has  edges  to  all  of  the  states  that  sa  and  sb  did.  In  other 

words,  /  *  (s„,  i)  is  the  union  of  /a^sa,  i)  and  fsisB,  *),  where  and  /b  arc  the  functions  for 
A  and  B.  If  either  sa  or  sb  was  an  accepting  state,  so  is  s„;  otherwise  the  accepting  states  are 
unchanged. 

(c)  Add  the  start  state  of  S{A)  to  the  accepting  states.  (This  allows  the  machine  to  accept  the 
empty  string,  which  is  needed  since  *  means  ''zero  or  more  times.")  Run  edges  from  the  ac- 
cepting states  of  S{A)  to  those  states  that  the  start  state  of  S{A)  goes  to.  In  other  words,  if  s 
is  the  start  state, 

f*u      _  /  /(^i^):  if  ^  is  not  an  accepting  state, 

\  /(i)  *)  U  /(s,  i),    if  t  is  an  accepting  state. 


(d)  From  each  accepting  state  of  A,  run  an  edge  to  each  state  to  which  the  start  state  of  B  has 
an  edge.  The  accepting  states  of  B  are  accepting  states.  If  the  start  state  of  B  is  an  accepting 
state,  then  the  accepting  states  of  A  are  also  accepting  states,  otherwise  they  are  not.  The 
start  state  is  the  start  state  of  A. 
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Figure  S.11.1  The  state  transition  digraph  for  covering  a  3  by  n  board  with  dominoes.  Each  vertex  is 
labeled  with  a  triple  that  indicates  whether  commitment  has  been  made  in  that  row  (C)  or  not  made  (N). 
The  start  and  end  states  are  those  with  no  commitments. 

Section  11.1 

11.1.1.  The  problem  is  to  eliminate  all  but  the  c's  from  the  recursion.  One  can  develop  a  systematic 
method  for  doing  this,  but  we  will  not  since  we  have  generating  functions  at  our  disposal.  In  this 

particular  case,  let  p„  =  /„,  +  .s,i  and  note  that  p„  —  p„_i  =  2c„_i  by  (11.5).  Thus,  by  the  first  of 
(11.4),  this  result  and  the  last  of  (11.5), 

Cn+l  -  Cn    =    (2c„  +Pn  +  C„-i)  -  (2c„_i  +  p„_i  +  Cn-2) 
=   2c„  -  C„_i  -  Cn-2  +  {Pn  -  Pn-l) 
=   2c„  +  C„_i  -  Cn-2- 

11.1.3  (a)   Figure  S.11.1  gives  a  state  transition  digraph. 

Let  an,s  be  the  number  of  ways  to  take  n  steps  from  the  state  with  no  commitments  and  end  in 
a  state  s.  Let  As{x)  =  J2n  '^n,sX"-  As  in  the  text,  the  graph  lets  us  write  down  the  linked  equations 
for  the  generating  functions.  Prom  the  graph  it  can  be  seen  that  depends  only  on  the  number  k 
of  commitments  in  s.  Therefore  we  can  write  Ag  =  B^-  The  linked  equations  are  then 

Bo{x)  =  x{Ba{x)+2Bi{x))  +  l 

B^{x)  =  x{Bo{x)  +  B2{x)) 

B2{x)  =  xBi{x) 

S3  (a;)  =  xB(i{x), 

which  can  be  solved  fairly  easily  for  Bo{x). 

(b)  Equate  coefficients  of  a;"  on  both  sides  of  (1  —  4a;^  +  x^)A{x)  =  1  —  x^. 

(c)  By  looking  at  the  dominoes  in  the  last  two  columns  of  a  board,  we  see  that  it  can  end  in  five 
mutually  exclusive  ways: 


This  shows  that  a„  equals  3a„_2  plus  whatever  is  counted  by  the  last  two  of  the  five  cases.  A 
board  of  length  n  —  2  ends  with  either  (i)  one  vertical  domino  and  one  horizontal  dominoes 
or  (ii)  three  horizontal  dominoes.  If  the  vertical  dominoes  mentioned  in  (i)  are  changed  to  the 
left  ends  of  horizontal  dominoes,  they  fit  with  the  last  two  cases  shown  above.  If  the  three 
horizontal  mentioned  in  (ii)  arc  removed,  we  obtain  all  boards  of  length  n  —  4.  Thus  the  sum 
of  the  last  two  cases  in  the  picture  plus  a„_4  equals  a„_2- 
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11.1.5.  Call  the  start  state  a  and  let  Lij  be  the  number  of  different  single  letter  inputs  that  allow 

the  machine  to  move  from  state  i  to  state  j.  Let  a„.i  be  the  number  of  ways  to  begin  in  state  a, 
recognize  n  letters  and  end  in  state  i  and  let  Ai  =  G^(a„,,).  The  desired  generating  function  is  the 
sum  of  Ai  over  all  accepting  states.  A  linked  set  of  recursions  can  be  obtained  from  the  automaton 
that  leads  to  the  generating  function  equations 


11.1.7  (a)  We  will  use  induction.  It  is  true  for  n  =  1  by  the  definition  of  rrix.y  =  m^x  l.  (Its 
also  true  for  n  =  0  because  the  zeroth  power  of  a  matrix  is  the  identity  and  so  m^xli  =  1  if 
X  =  y  and  0  otherwise.)  Now  suppose  that  n  >  1.  By  the  definition  of  matrix  multiplication, 
m^l  =  m^x,z  ^'^TTiz^y  By  the  induction  hypothesis  and  the  definition  of  m^  j,  each  term  in 
the  sum  is  the  number  of  ways  to  get  from  x  to  z  in  n  —  1  steps  times  the  number  of  ways  to 
get  from  ^;  to  y  in  one  step.  By  the  Rules  of  Sum  and  Product,  the  proof  is  complete. 

(b)  If  a  is  the  initial  state,  iM"a*  —  ^  rn!^a}y,  the  sum  ranging  over  all  accepting  states  y. 

(c)  By  the  previous  part,  the  desired  generating  function  is 


^iM"a*a;"  =  i^a;"M"a*  =  i^(a;M)"a*  =  \{I  -  xM)-'^a^ 


n=0  n=0  71=0 


(d)   The  matrix  M  is  replaced  by  the  table  given  in  the  solution  to  the  previous  exercise. 


Section  11.2 

11.2.1.  Theorem.  Suppose  each  structure  in  a  set  T  of  structures  can  be  constructed  from  an 
ordered  partition  (1^1,^2)  of  the  labels,  two  nonnegative  integers  ii  and  £2,  and  some  ordered  pair 
{Ti,T2)  of  structures  using  the  labels  Ki  in  Ti  and  K2  in  T2  such  that: 

(i)  The  number  of  ways  to  choose  a  Tj  with  labels  Ki  and  £i  unlabeled  parts  depends  only 
on  i,  \Ki\  and  ii. 

(ii)  Each  structure  T  gT  arises  in  exactly  one  way  in  this  process. 

(We  allow  the  possibility  of  ii'j  =  0  if  Tj  contains  structures  with  no  labels  and  likewise  for  £i  =  0.) 
It  then  follows  that 

T{x,y)  =  Ti{x,y)T2{x,y), 

where  Ti{x,y)  =  J2'^=o^i,n,m{x"' /nl)y"^  and  ti^n,m  is  the  number  of  ways  to  choose  Tj  with  labels 
n  and  k  unlabeled  parts.  Define  T{x,  y)  similarly. 

The  proof  is  the  same  as  that  for  the  original  Rule  of  Product  except  that  there  is  a  double 

sum: 

m  n      rn      /  \ 

tn,m    =     ^  ti,\Ki\,ei  h,n-\Ki\,m-ei    =  {  j]  ^l,fc,^i  ^2,ra-fc,m-^i 

11.2.3  (a)    By  the  text, 

^z(n,fc)/  =  y{y+l)---{y  +  n-l). 
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Replacing  all  but  the  last  factor  on  the  right  hand  side  gives  us 


Equate  coefficients  of  . 

(b)  For  each  permutation  counted  by  z(ri,  /c),  look  at  the  location  of  n.  There  arc  z(n  —  1,  /c  —  1) 
ways  to  construct  permutations  with  n  in  a  cycle  by  itself.  To  construct  a  permutation  with 
n  not  in  a  cycle  by  itself,  first  construct  one  of  the  permutations  counted  by  z{n  —  1,  fc)  AND 

then  insert  n  into  a  cycle.  Since  there  arc  j  ways  to  insert  a  number  into  a  j-cycle,  the  number 
of  ways  to  insert  n  is  the  sum  of  the  cycle  lengths,  which  is  n  —  1. 

11.2.5  (a)    For  any  particular  letter  appearing  an  odd  number  of  times,  the  generating  function  is 

=   ^   with  Taylor's  theorem  and  some  work. 


n  odd 


We  must  add  1  to  this  to  allow  for  the  letter  not  being  used.  The  Rule  of  Product  is  then  used 
to  combine  the  results  for  A,  B  and  C. 

(b)    Multiplying  out  the  previous  result: 


3 

1  +       ^      \     =  1  +  3(e^  -  e-^)/2  +  3(e^  -  e-^)V4  +  (e^  -  e"^) 


=  1  +  (3e^/2  -  3e-^/2)  +  (36^^/4  -  3/2  +  3e-2^/4)  +  (e^^/S  -  3e^/8  +  3e-^/8  -  e'^^/s) 
=  -1/2  +  (e^^/S  -  e-3^/8)  +  (36^^/4  +  6-^^/4)  +  (9e^/8  -  Qe-^/s). 


Now  compute  the  coefficients. 
11.2.7.  We  saw  in  this  section  that  B{x)  =  exp(e^  —  1).  Differentiating: 

B\x)  =  exp(e^  -  1)  (e^  -  1)'  =  S(a;)e^. 

Equating  coefficients  of  x": 


=  E 


n!  k\  {n-k)V 

which  gives  the  result. 

11.2.9  (a)    Let  gn,k  be  the  number  of  graphs  with  n  vertices  and  k  components.  We  have 
J2n  kSn,k{x"'/nl)y''  =  exp{yC{x)),  by  the  Exponential  Formula.  Differentiating  with  respect 


to  y  and  setting  y  =  I  gives  us 


dexp{yC{x)) 
dy 


=  H{x). 


(b)  J2k  9n,k  is  the  number  of  ways  to  choose  an  n-vertex  graph  and  mark  a  component  of  it.  We 
can  construct  a  graph  with  a  marked  component  by  selecting  a  component  (giving  C(a;))  AND 
selecting  a  graph  (giving  G{x)). 
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(c)  For  permutations,  there  are  (n  —  1)!  connected  components  of  size  n  and  so 

=  E^^^^^  =  E^"/n  =  -ln(l-x). 

Since  there  are  n!  permutations  of  n,  G{x)  =  and  the  average  number  of  cycles  in  a 
permutation  is 

n!  ^    ^        n  I-  X  k 

k=\  K=l 

(d)  Since  C{x)  =     —  1,  we  have  H{x)  =  (e^  —  1)  cxp(e^  —  1)  and  so 

k  =  l  ^   ^  fc=0  ^  ^ 

which  is  _B(n  +  1)  —  B„  by  the  previous  exercise. 

(e)  Since  C{x)  =  x  +  x'^/2,  we  have  H{x)  —  (x  +  x^/2)I{X),  where  I{x)  is  the  EGF  for  i„,  the 
number  of  involutions  of  n.  Thus  the  average  number  of  cycles  in  an  involution  of  n  is 

{l)in-l  +  (2)^^-2  _  ^  /^-j^  _|_ 

in  2  y  ijj 

where  the  right  side  comes  from  the  recursion  i„  =        +  (n  —  l)i„_2. 
11.2.11.  Suppose  n  >  1.  Since  /  is  alternating, 

•  fc  is  even; 

•  /(I), . . . ,  /(fc  —  1)  is  an  alternating  permutation  of  {/(I), . .  • ,  /(fc  —  1)}; 

•  /(fc  +  1), . . . ,  /(n)  is  an  alternating  permutation  of  {/(fc  +  1), . . . ,  f{n)}. 

Thus,  an  alternating  permutation  of  n  for  n  >  1  is  built  from  an  alternating  permutation  of 
odd  length  AND  an  alternating  permutation,  such  that  the  sum  of  the  lengths  is  n  —  1.  We 

have  shown  that 

]:^^.  = 

and  so  A'{x)  =  B{x)A{x)  +  1.  Similarly,  B'{x)  =  B{x)B{x)  +  1. 

Separate  variables  in  B'  =  B"^  +  1  and  use  B{0)  =  0  to  obtain  B{x)  =  tana;.  Use  the 
integrating  factor  cos  x  for 

A'{x)  =  {ta.nx)A{x)  +  1 
and  the  initial  condition  A{0)  =  1  to  obtain  A{x)  =  tana;  +  sec  a;. 
11.2.13  (a)    The  square  of  a  fc-cycle  is 

•  another  cycle  of  length  A;  if  fc  is  odd; 

•  two  cycles  of  length  fc/2  if  fc  is  even. 

Using  this,  we  see  that  the  condition  is  necessary.  With  further  study,  you  should  be  able  to 
see  how  to  take  a  square  root  of  such  a  permutation. 

(b)  This  is  simply  putting  together  cycles  of  various  lengths  using  (a)  and  recalling  that  there  are 

(fc  —  1)!  fc-cycles. 

00  ^ 

(c)  By  bisection  ^  ^  =  i  ({-  ln(l  -  x)}  -  {- ln(l  -  (-a;))}) . 

k=l 
k  odd 

(d)  We  don't  know  of  an  easier  method. 
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11.2.15  (a)  We'll  give  two  methods.  First,  we  use  the  Exponential  Formula  approach  in  Exam- 
ple 11.14.  When  we  add  a  root,  the  number  of  leaves  does  not  change  except  when  we  started 
with  nothing  and  ended  up  with  a  single  vertex  tree.  Correcting  for  this  exception  gives  us  the 
formula. 

Without  the  use  of  the  Exponential  Formula,  we  could  partition  the  trees  according  to 
the  degree  k  of  the  root,  treating  A;  =  0  specially  because  in  this  case  the  root  is  a  leaf: 

oo 

L{x,y)=xy  +  Y,L{x,y)''/kl 
fe=i 


(b)  This  type  of  problem  was  discussed  in  Section  10.3.  Recall  that  there  are  n"  ^  n-vertex  rooted 

labeled  trees. 

(c)  Differentiate  the  equation  in  (a)  with  respect  to  y  and  set  y  =  1  to  obtain 

U{x)  =  xe^^=''>U{x)  +x  =  T{x)U{x)  +  x, 

where  we  have  used  the  fact  that  L{x,  1)  —  T{x)  =  xe^^'^K  Solving  for  U:  U  =  jzpr-  Differenti- 
ating T{x)  =  xe^^^^  and  solving  for  T'{x)  gives  us  T'  =  ^(^z^-  Thus  x^T'  +  x  =  which 
gives  the  equation  for  U{x). 

We  know  that  tn  =  It  follows  from  the  equation  for  U{x)  that 


(n-1) 


n-2 


n\        (n-2)!  ' 

Thus  Un/tn  =  n/(l  +  xY^^,  where  x  =  ^^^j.  As  n  — >  oo,  a;  — >  0  and,  by  I'Hopital's  Rule 
(1  +         =  exp  (^^^^^^)  -  exp(l)  =  e. 

11.2.17  (a)   There  are  several  steps 

•  Since  5  is  a  function,  each  vertex  of  (p{g)  has  outdegree  1.  Thus  the  image  of  n-  lies  in 

•  is  an  injection:  if  <^(,g)  =  'f'(/i),  then  {x,g{x))  ~  [x,  h{x))  for  all  .t  G  n  and  so  g  —  h. 

•  Finally  ip  is  onto  J^n-  If  (^i-E-)  G        for  each  x  €  n  there  is  an  edge  {x,y)  G  E.  Define 
9{x)  =  y. 

(b)  Let's  think  in  terms  of  a  function  g  corresponding  to  the  digraph.  Let  k  G  n.  If  the  equation 

g*(fc)  has  a  solution,  then  /c  is  on  a  cycle  and  will  be  the  root  of  a  tree.  The  other  vertices  of 
the  tree  are  those  j  £  n  for  which         =  k  for  some  s. 

(c)  This  is  simply  an  application  of  Exercise  11.2.2. 

(d)  In  the  notation  of  Theorem  11.6,  T{x)  is  T{x),  f{t)  =  and  g{u)  =  —  ln(l  —  Thus  n(/„/n!) 
is  the  coefficient  of  u"  in  e""(l  —  u)~^.  Using  the  convolution  formula  for  the  coefficient  of  a 
product  of  power  series,  we  obtain  the  result. 
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Section  11.3 


11.3.1  (a)    AB  =  CD  follows  from  Ui  +  bi  =  min(ai,6i)  +  max(ai,6i).  C  divides  A  if  and  only  if 

Ci  <  Gi  for  all  i.  Thus  C  divides  both  A  and  B  if  and  only  if  q  <  mm{ai,bi)  for  all  i.  Thus 

C  ~  gcd(A,  B).  The  claim  that  D  =  lcni(A,  B)  follows  similarly. 

(b)  It  follows  from  the  definition  of  gcd  that  gcd(n,  i)  must  divide  both  n  and  i.  This  completes 
the  first  part.  From  (a),  we  have  gcd(a6,  ac)  =  agcd(6,  c).  Apply  this  with  ah  =  n  and  a  =  k: 
gcd(n,  i)  =  kgcd{n,i/k). 

(c)  By  letting  j  =  n/k,  we  can  see  that  the  two  forms  of  the  sum  are  equivalent,  so  we  need  only 
prove  one.  Let  g  generate  C„,  that  is  g{i)  =  i  +  1  modulo  n.  Let  h  =  so  that  h{i)  =  i  +  t 
modulo  n.  Thus  =  i  +  kt.  The  smallest  A;  >  0  such  that  h'^  is  the  identity  is  thus  the 
smallest  k  such  that  kt  is  a  multiple  of  n.  Also,  the  cycle  form  of  h  consists  of  fc-cycles.  Since 
there  are  n  elements,  there  must  be  n/k  A;-cycles.  Thus  h  contributes  z^^'^  to  the  sum.  We  need 
to  know  how  many  values  of  t  give  a  particular  value  of  k.  By  looking  at  prime  factorization, 
you  should  be  able  to  see  that  k  gcd(n,  t)  =  n  and  so  gcd(n,  t)  =  n/k.  We  can  now  use  (b)  with 
k  replaced  by  n/k  to  conclude  that  the  number  of  such  t  is  ip{n/{n/k))  =  ip{k). 

11.3.3  (a)  A  regular  octahedron  can  be  surrounded  by  a  cube  so  that  each  vertex  of  the  octahedron 
is  the  center  of  a  face  of  the  cube.  The  center  of  the  octahedron  is  the  center  of  the  cube.  A  line 
segment  from  the  center  of  the  cube  to  a  vertex  of  the  cube  passes  through  the  center  of  the 
corresponding  face  of  the  octahedron.  A  line  segment  from  the  center  of  the  cube  to  the  center 
of  an  edge  of  the  cube  passes  through  the  center  of  the  corresponding  edge  of  the  octahedron. 

(b)  By  the  above  correspondence,  the  answer  will  be  the  same  as  the  symmetries  of  the  cube  acting 
on  the  faces  of  the  cube.  See  (11.31). 

(c)  By  the  above  correspondence  it  is  the  same  as  the  answer  for  the  edges  of  the  cube.  See  the 

previous  exercise. 

11.3.5.  The  group  is  usually  called  ^4.  Here  are  its  4!  =  24  elements: 

•  The  identity,  which  gives  xf. 

•  (4  —  1)!  =  6  elements  which  are  4-cycles,  which  give  60:4. 

•  (2)  =  6  elements  which  consist  of  two  1-cycles  and  a  2-cycle,  giving  Qx^x^- 

•  (i)x(3-i)!=8)  elements  which  consist  of  a  1-cycle  and  a  3-cycle,  giving  8a;ia;3. 

•  1(2)  =  3  elements  which  consist  of  two  2-cycles,  giving  3x2  • 


Thus 


xf  +  6X4  +  6X1X2  +  8X1X3  +  2X2 

24 


Now  apply  Theorem  11.9: 


11.3.7.  If  the  vertices  belong  to  different  cycles  of  length  i  and  j  >  i  we  get  x\ 


.gcd(i,j) 
■lcm(i,j) 


as  in  the 


digraph  case  and  so  we  get 


n(^WiJ)) ''''''' 
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If  the  two  cycles  have  the  same  length,  we  must  be  careful  not  to  overcount  because  the  edge  is  not 
directed.  When  the  two  vertices  are  in  different  cycles  of  length  i  we  get  and  there  are  (2*)  such 
pairs  of  cycles.  When  the  two  vertices  belong  to  the  same  cycle  of  length  i,  we  must  be  extra  careful: 
If  the  separation  between  the  vertices  on  the  cycle  is  i/2,  the  edge  {u,  v}  comes  back  after  i/2  steps 
around  the  cycle  as  {v,u},  so  to  speak.  Otherwise,  it  must  go  i  steps  around  the  cycle.  Thus  we  get 
a  contribution  of  either 

Putting  this  all  together,  Yl  becomes 

i<3  i  i  odd  i  even 


Section  11.4 

11.4.1  (a)   We  have      =  2r  +  1  and  so  r  =  1  +  \/2  and  m  =  1.  By  the  principle,  we  expect  there 

is  some  constant  A  such  that  a„  ~  A{1  +  -^2)". 

(b)   Since  A{x)  =  -^J^t-x^ '      ^^"^^  P(^)  =  1  +  2;,  q{x)  =  l-2a;-a;^r  =  \/2-l  =  1/(1  +  \/2) 
and  q'{r)  =  — 2-\/2.  Thus  A;  =  1  and  we  have 

(-l)i^ni-i  /-.„+! 


(c)    We  have  1  —  2x  —  x^  =  (1  —  ax){l  —  bx)  where  a  =  1  +       and  6=1  —  ^/2.  Expanding  by 
partial  fractions: 

X 

2x  —  x'^ 


(2- V2)/(2V2) 
1-bx 

Thusa„  =  i(l  +  \/2)"+i  +  i(l-\/2)"+i. 

11.4.3.  Prom  the  discussion  in  the  example,  you  can  see  that  merging  two  lists  of  lengths  i  and 

j  >  i  takes  at  least  i  comparison.  Thus  the  example  shows  that  the  number  of  comparisons  for 
merge  sorting  satisfies  T„  =  /(n)  +  T(m)  +  T{n  —  m)  where  m  =  [n/2j  and  m  <  f{n)  <  n.  Apply 
Principle  11.3. 

11.4.5.  We'll  use  Principle  11.4  (p.  345)  so  tn,k  will  denote  the  fcth  term  of  the  sum  we're  given, 
(a)  Since 

tn,k+i  ^  n-k 


1  +  X  _  1 

1  -  2a;  -  a;2  ~  1  -  2a;  -  a;^  ^  Y 

a  b 


1  —  ax      1  —  bx 

+ 


1  1 

a—b  a—b 


1  —  ax     1  —  bx 
(2  +  ^/2)/(2^/2) 
1  —  aa; 
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is  less  than  1  and  is  close  to  1  when  k/n  is  small,  we'll  use  Principle  11.5  (p.  346).  Since 

1  —  Tfe        1  —  {n  —  k)/n  1 
k  k  n 

(11.38)  gives  the  estimate 


im  n\  n 


2  n! 

(b)   This  is  a  bit  more  complicated  than  (a)  since 

tn,k+l         k  +  1  n- 


^n,k  k  71 

is  greater  than  1  for  small  k  and  less  than  1  for  k  near  n.  Thus  tn^k  achieves  its  maximum 
somewhere  between  1  and  n,  namely,  when  the  above  ratio  equals  1.  This  leads  to  a  quadratic 
equation  for  k  which  has  the  solution 

-1  + Vl  +  4n 
"  -  2  ■ 

Since  this  differs  from  -./n  by  at  most  a  constant,  we'll  split  the  sum  into  two  pieces  at  A:  =  ^/n 
and  use  Principle  11.5  (p.  346)  for  each  half.  Since  each  half  has  the  same  estimate,  we  simply 
double  one  result.  Ignoring  the  fact  that  y'n  is  not  an  integer,  we  set  k  =  y/n  +  j  and  use  j  >  0 
as  the  new  index  of  summation.  Call  the  new  terms  t'     We  have 

^    _  _  tn,k+i        k  +  1  n-k 

^'n,j  tn,k  k  n 

y/n  +  j  +1  n-  y/n-  j 


=  1  + 
=  1- 


rn  +  j  n 

n-  {y/n  +  jY  -  (Vn  +  j) 

n(^/n  +  j) 
2jy/n  +  f  +  y/n  +  j 


l-'^  =  l-2i/n. 
n^^/n 


Thus  (1  —rj)/j  w  2/n  and  so  we  obtain  the  following  approximation  (the  factor  of  2  is  due  to 
the  presence  of  two  sums) 

r,  /  77^/  \/nn\ 

2V7m/44,o 


i\  

nV^{n  -  -v/n)!' 


where  (n  —  y/n)\  should  be  approximated  using  Stirling's  formula  (Theorem  1.5  (p.  12))  since 
we  have  no  formula  for  x\  when  x  is  not  a  positive  integer. 

11.4.7.  Use  Principle  11.6  (p.  349)  with  r  =  1,  6  =  0  and  c  =  —  1  to  obtain 

^  kes 
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11.4.9  (a)   The  function  has  radius  of  convergence  r  =  1  and  has  a  singularity  at  —1  =  — r.  Another 
reason  we  can't  use  it  for  Ae{x)  is  that  ag^n  =  0  whenever  n  is  odd. 

(b)  For  both  cases,  r  =  1,  6  =  0  and  c  =  —1/2.  We  obtain  L  =       for  Ao{x)  and  L  =  l/\/2  for 

Ae{x). 

(c)  By  power  series,  ae,2n  =  (—!)"(  i^^)(2'^)!,  which  can  be  rearranged  to  give  the  answer.  By 
StirUng's  formula,  ae.2n  ^  (2n)!/v^~  2(2n/e)^". 

(d)  Since  Ao{x)  =  {l  +  x)Ae{x),  we  have  ao,2n  =  cie,2n  and  ao,2n+i  =  (2n+l)ae,2n-  By  the  previous 
part,  ao,2n  =  C^) (2n)!  4""  and  ao,2n+i  =  C^) (2n  +  1)!  4"". 

11.4.11.  We  use  Principle  11.6  (p.  349)  with 

A{x)  =  (l-2x-3x^)-^/^   6  =  0  and  c=-l/2. 

Since  1  -  2a;  -  Sa;^  =  (1  -  3a;)(l  +  x),  r  =  1/3  and 

(1  -  2a;  -  3x2)-i/2  i  ^ 

L  =    lim  ^ — ;  ; — —        =    lim     .  =  — . 

x^l/3        (l-3x)-V2  3,^1/3  yTT^  2 

Thus 

73  3"n-V2  3"+V2 


2r(i/2)  277m 

11.4.13.  We  use  Principle  11.6  (p.  349).  Since  (1  +  a;^)^  -  4a;  vanishes  at  a;  =  r  =  0.295597742  . . . 
and  is  positive  whenever  — r  <  a;  <  r,  we  have  found  r.  Thus  we  take 


X       n       /  M/2          ^^        -1   /  (1  +  a;2)2  -  4a;  1  +  a;^ 

/(a;)  =  (l-a;/r)  /  ,       g{x)  =  — W  ^  _    and       /i(a;)  =  . 


We  have 


-V(l  +  a;2)2  -4a;          1   /,.     (1  -  a;2)2  -  4a; 
L  =  lim  —  ^         =  —  A  hm  . 

x^r  _  r^jnf  2  y  a:^r        1  -  xjr 


By  using  I'Hopital's  Rule,  we  obtain 


L  =  -Ij^llLj^) — i  =  -yr  +  r2(l-r2)  =  -0.61265067. 
z  \/  i  /  r 


l/r 

11.4.15.  By  techniques  we  have  used  before, 

E{x)  =  xY^n{xf+x. 


k>2 


Sum  the  geometric  series  and  use  algebra  to  obtain  the  desired  quadratic  equation  for  H{x). 

This  quadratic  could  be  treated  as  an  implicit  equation  for  H{x)  and  we  could  apply  Prin- 
ciple 11.7  (p.  353).  Alternatively,  we  could  solve  the  quadratic  for  H{x)  and  use  Principle  11.6 
(p.  349).  For  Principle  11.7,  let  F{x,  v)  ^  ~  y  +  Then  Fy{x,  y)  =  2y-l  and  so  s  =  1/2  and 
^  =  1/2  -  (1/2)2,  ^YAch.  yields  r  =  1/3.  For  Principle  11.6, 

_  1- Vl-4a;/(l  +  a;)  _  1  -  V(l  -  3a;)/(l  +  a;) 

"■\^)     —  r,  —  r, 
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Thus  r  =  1/3,  f{x)  =  (1  -  x/r-y/^,  g{x)  =  -1/2(1  +  x)^/^  and  h{x)  =  1/2.  In  any  case,  the  answer 
is 

V3  3" 
4v7rn'> 

11.4.17.  We  will  use  Principle  11.6  (p.  349).  We  want  to  solve 


for  r  because  then  we  have  a  singularity  due  to  division  by  zero.  This  can  always  be  done: 

•  The  radius  of  convergence  of  the  sum  is  oo. 

•  The  sum  vanishes  at  r  =  0. 

•  The  sum  is  increasing  and  unbounded  as  r  ^  +oo. 

Having  found  r,  we  let  /(x)  =  (1  —  x/r)~^.  Then  g[x)  =  (1  —  xlr)A{x)  and,  by  THopital's 

Rule, 

T  ^  y  1  -  a^/T        ^   1  

Let  d  =  gcd(D).  You  should  be  able  to  see  that  a„  =  0  when  n  is  not  a  multiple  of  d.  Hence 
we'll  need  to  assume  that  d=\.  (Actually  you  can  get  around  this  by  setting  x"^  =  new  variable.) 
The  answer  for  general  d  is 

d  Til 

when  d  divides  n 


E^V(fc-i)! 


and  a„  =  0,  otherwise. 
11.4.19  (a)   We  have 


k-t 


Apply  the  principle  to  each  term  in  the  sum.  Since  c  <  0,  the  largest  contribution  comes  from 
the  t  =  k  term.  Thus  6„  ~  g{rYn~'^^~^ /T{~ck). 

(b)  Proceed  as  in  the  previous  part.  Since  c  >  0,  the  largest  contribution  is  now  from  t=l  and  so 

(c)  The  formula  for  B{x)  is  just  a  bit  of  algebra.  The  only  singularity  on  [— r,  r]  is  due  to  f{x)  = 
(1  -  a;/r)i/2. 

(d)  Since  A{x)  is  a  sum  of  nonnegative  terms,  it  is  an  increasing  function  of  x  and  so  A{x)  =  1  has 
at  most  one  positive  solution.  We  take  &  =  0,  c  =  — 1  and  f{x)  =  (1  —  x/s)~^  in  Principle  11.6. 
Then 

lim  =  li-^^  =  4r-. 

x^s  (l-x/s)-^        x^sl-A{x)  sA'{s) 

by  rHopital's  Rule. 

Suppose  that  c  <  0.  Note  that  A(0)  =  0  and  that  A{x)  is  unbounded  as  x  ^  r  because 
A{x)/{1  —  x/r)'-  approaches  a  nonzero  limit.  Thus  A{x)  =  1  has  a  solution  in  (0,r)  by  the 
Mean  Value  Theorem  for  continuous  functions. 

(e)  If  we  could  deal  with  e^^^^  using  Principle  11.6,  we  could  multiply  g{x)  and  h{x)  in  the 
principle  by  6**^^^  where  s{x)  is  either  of  the  sums  given  in  this  exercise.  Then  we  can  apply 
Principle  11.6. 
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11.4.21  (b)   Let  U{x)  =  T{x)/x.  The  equation  for  U  isU  =  Y.  x'^U'^/dl.  Replacing  x''  with  z,  we 

see  that  this  leads  to  a  power  series  for  U  in  powers  of  ^;  =  a;''.  Thus  the  coefficients  of  a;™  in 

U(x)  will  be  0  when  m  is  not  a  multiple  of  k. 

(c)    We  apply  Principle  11.7  (p.  353)  with 

F{x,y)  =  y-xY,yVd\    and    Fy{x,y)  =  I  -  x       v"^'^  l{d  - 

deD  deD 

djtO 

Using  F{r,  s)  —  0  =  Fy{r,  s)  and  some  algebra,  we  obtain 

^(d-l)sVd!  =  1    and    r  =      ^  s^-V(d  -  1)!")  . 

djtO  djtO 

Once  the  first  of  these  has  been  solved  numerically  for  s,  the  rest  of  the  calculations  are 
straightforward. 


Appendix  A 

A.l.  A{n)  is  the  formula  1  +  3  H  h  (2n  —  1)  =      and  tiq  =  ni  =  1.  ^(1)  is  just  1  =  1^.  To  prove 

A{n  +  1)  in  the  inductive  step,  use  A{n): 

(1  +  3  +  •  •  •  +  (2n  -  1))  +  (2n  +  1)  =     +  (2n  +  1)  =  (n  +  1)^. 

A.3.  Let  A{n)  be  jy^Zl{-l)''k^  =  (-1)""^  ELi  ^-  %  (^-1),  we  can  replace  the  right  hand  side 
of  A{n)  by  (— l)"'~^n(n  +  l)/2,  which  we  will  do.  It  is  easy  to  verify  .4(1).  As  usual,  the  induction 
step  uses  A{n  —  1)  to  evaluate  X]fe=i  (^1)*^^^^^  ^^"^  some  algebra  to  prove  A{n)  from  this. 

What  would  have  happened  if  we  hadn't  thought  to  use  (A.l)?  The  proof  would  have  gotten 
more  complicated.  To  prove  A{n)  we  would  have  needed  to  prove  that 

n— 1  n 
fe=l  fe=l 

At  this  point,  we  would  have  to  prove  this  result  separately  by  induction  or  prove  in  using  (A.l). 
A. 5.  The  claim  is  true  for  n  =  1.  For  n  +  1,  we  have 

(x"+i)'  =  (a;"x)'  =  (2;")'a;  +  (x")x'  =  (na;"-^)2;  +  x", 

where  the  last  used  the  induction  hypothesis.  Since  the  right  side  is  (n  +  l)a;",  we  are  done. 

A.  7.  The  inductive  step  only  holds  for  n  >  3  because  the  claim  that  P„_i  belongs  to  both  groups 
requires  n  —  1  >  2;  however,  .4(2)  was  never  proved.  (Indeed,  if  .4(2)  is  true,  then  A{n)  is  true  for 
all  n.) 

A. 9.  This  is  obviously  true  for  n  =  1.  Suppose  we  have  a  numbering  when  n  —  1  lines  have  been 
drawn.  The  nth  line  divides  the  plane  into  two  parts,  say  A  and  B.  Assign  all  regions  in  A  the  same 
number  they  had  with  n  —  1  lines  and  reverse  the  numbering  in  B. 
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Section  B.l 

B.l.l.  We'll  omit  most  cases  where  the  functions  must  be  nonnegative.  Also,  the  proofs  for  0 

properties  are  omitted  because  they  are  like  those  for  O  without  the  "A"  part  of  the  inequalities. 
All  inequalities  are  understood  to  hold  for  some  A's  and  B's  and  all  sufficiently  large  n. 

(a)  Note  that  g{n)  is  Q{f{n))  if  and  only  if  there  are  positive  constants  B  and  C  such  that 
\gin)\  <  B\fin)\ and  |/(n)|  <  C\gin)\.Let  A  =  1/C.  Conversely,  if  ^|/(n)|  <  \g{n)\  <  B\f{n)\, 
let  C  =  1  /A  and  reverse  the  previous  steps. 

(b)  These  follow  easily  from  the  definition. 

(c)  These  follow  easily  from  the  definition. 

(d)  We  do  e  using  A.  Let  A'  =  A\C/D\  and  B'  =  B\C/D\.  Then  A'\Df{n)\  <  \Cg{n)\  < 
B'\Df{n)\. 

(e)  Use  (a):  We  have  (1/B)|ff(n)|  <  |/(n)|  <  {l/A)\g{n)\. 

(f)  See  proof  in  the  text. 

(g)  We  do  6()  and  use  (a).  We  have  Ai\fi{n)\  <  \gi{n)\  <  Bi\fi{n)\.  Multiplying  and  using  (a) 
with  A  =  A1A2  and  B  =  B1B2  gives  the  first  result.  To  get  the  second  part,  divide  the 
i  =  1  inequalities  by  the  i  =  2  inequalities  (remember  to  reverse  direction!)  and  use  (a)  with 
A  =  A1/B2  and  B  =  B1/A2. 

(h)  See  proof  in  the  text. 

(i)  This  follows  immediately  from  (h)  if  we  drop  the  subscripts. 
B.1.3.  This  is  not  true.  For  example,  n  is  0{'n?),  but  in?  is  not  0{n). 

B.1.5  (a)    Hint:  There  is  an  explicit  formula  for  the  sum  of  the  squares  of  integers. 

(b)  Hint:  There  is  an  explicit  formula  for  the  sum  of  the  cubes  of  integers. 

(c)  Hint:  If  you  know  calculus,  upper  and  lower  Riemann  sum  approximations  to  the  integral  of 
f{x)  =  x^l"^  can  be  used  here. 

B.1.7  (a)   Here's  a  chart  of  values. 


5 

10 

30 

100 

300 

2 

n 

25 

102 

9 

X  102 

10-* 

9  X  10" 

lOOn 

5  X  10^ 

10^ 

3 

X  10^'' 

10^ 

3  X  10^ 

100(2"/^°  -  1) 

41 

102 

7 

X  102 

10'^ 

108 

fastest 

A 

A,  C 

c 

A,  B 

B 

slowest 

B 

B 

B 

C 

C 

(b)  When  n  is  very  large,  B  is  fastest  and  C  is  slowest.  This  is  because,  (i)  of  two  polynomials 
the  one  with  the  lower  degree  is  eventually  faster  and  (ii)  an  exponential  function  grows  faster 
than  any  polynomial. 

B.1.9.  Let  p{n)  =  Y!1=q  hn'  with  bk  >  0. 


(a)   Let  s  =  Y^i^o  \bi\  and  assume  that  n  >  2s /bk-  We  have 


\p{n)-bkn''\  < 


k-l 


fe-1 


^hre  <       \bi\n'  <       |6i|n'=-i  =  sn''-'^  <  bk'nJ' /2. 


i=0 


1=0 


i=0 


Thus  \p{n)\  >  bku''  -  bkn''/2  >  (fofc/2)n'=  and  also  \p{n)\  <  bk'n''  +  bkU^ /2  <  (36fc/2)n'=. 
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(b)  This  follows  form  (a)  of  the  theorem. 

(c)  By  applying  I'Hospital's  Rule  k  times,  we  see  that  the  limit  of  p(n)/a"  is  lim  {k\/{loga)^)/a"', 

n^oo 

which  is  0. 

(d)  By  the  proof  of  the  first  part,  p{n)  <  (36^/2) n*'  for  all  sufficiently  large  n.  Thus  we  can  take 
C  >  36fe/2. 

(e)  For  p{n)  to  be  0(a'-^"  ),  we  must  have  positive  constants  A  and  B  such  that  A  <  oP^"^^  jaP"^  < 
B.  Taking  logarithms  gives  us  log^j  A  <  p{n)  —  Cn'^  <  log„  B.  The  center  of  this  expression  is  a 
polynomial  which  is  not  constant  unless  p{n)  =  Cn'^  +  D  for  some  constant  D,  the  case  which 
is  ruled  out.  Thus  p{n)  —  Cn''  is  a  nonconstant  polynomial  and  so  is  unbounded. 

B.1.11  (a)  The  worst  time  possibility  would  be  to  run  through  the  entire  loop  because  the  "If" 
always  fails.  In  this  case  the  running  time  is  0(n).  This  actually  happens  for  the  permutation 
ai  =  i  for  all  i. 

(b)  Let  Nk  be  the  number  of  permutations  which  have  ai-i  <  for  2  <  i  <  fc  and  ak  >  dfc+i. 
(There  is  an  ambiguity  about  what  to  do  for  the  permutation  a-i  =  i  for  all  i,  but  it  contributes 
a  negligible  amount  to  the  average  running  time.)  The  "If"  statement  is  executed  k  times  for 
such  permutations.  Thus  the  average  number  of  times  the  "If"  is  executed  is  ^  kNk/n\.  If  the 
a^'s  were  chosen  independently  one  at  a  time  from  all  the  integers  so  that  no  adjacent  ones  are 
equal,  the  chances  that  all  the  k  inequalities  ai  <  a2  <  ■  ■  ■  <  >  cLk+i  hold  would  be  (1/2)'^. 
This  would  give  Nk/n\  =  (1/2)'^  and  then  X^^q  kNk/n\  would  converge  by  the  "ratio  test." 
This  says  that  the  average  running  time  is  bounded  for  all  n.  Unfortunately  the  a^'s  cannot  be 
chosen  as  described  to  produce  a  permutation  of  n. 

We  need  to  determine  N)..  With  each  arbitrary  permutation  ai,  02, ...  we  can  associate  a 
set  of  permutations  61,  •  •  •  counted  by  N^.  We'll  call  this  the  set  for  ai,  0,2, . . ..  For  i  >  k+1, 
bi  =  Oj,  and  61, ... ,  bk+i  is  a  rearrangement  of  ai, . . . ,  ak+i  to  give  a  permutation  counted  by 
Nk-  How  many  such  rearrangements  are  there?  bk+i  can  be  any  but  the  largest  of  the  a^'s 
and  the  remaining  b^'s  must  be  the  remaining  Oj's  arranged  in  increasing  order.  Thus  there 
are  k  possibilities  and  so  the  set  for  ai,  a2, . . .  has  k  elements.  Hence  the  set  associated  with 
ai,a2, . . .  contains  k  permutations  counted  by  Nk-  Since  there  are  n\  permutations,  we  have 
a  total  of  n\k  things  counted  by  Nk',  however,  each  permutation  &i,  &2,  •  •  •  counted  by  Nk  ap- 
pears in  many  sets.  In  fact  it  appears  (fc  +  1)!  since  any  rearrangement  of  the  first  k+1  bi's 
gives  a  permutation  that  has  61,  62,  •  ■  •  in  its  set.  Thus  the  number  of  things  in  all  the  sets  is 
Nkik+  1)!.  Consequently,  Nk  =  n\k/{k+  1)!. 

By  the  previous  paragraphs,  the  average  number  of  times  the  "If"  is  executed  is 
^         +  1)!,  which  approaches  some  constant.  Thus  the  average  running  time  is  6(1). 

(c)  The  minimum  running  time  occurs  when  a„  >  a„+i  and  this  time  is  8(n).  By  previous  results 
the  maximum  running  time  is  also  6(n).  Thus  the  average  running  time  is  G(n). 

Section  B.3 

B.3.1  (a)    If  we  have  know  x{G),  then  we  can  determine  if  c  colors  are  enough  by  checking  if 

c>x(G). 

(b)    We  know  that  0  <  x(G)  <  n  for  a  graph  with  n  vertices.  Ask  if  c  colors  suffice  for  c  =  0, 1, 2,  

The  least  c  for  which  the  answer  is  "yes"  is  x(G).  Thus  the  worst  case  time  for  finding  x(G) 
is  at  most  n  times  the  worst  case  time  for  the  NP-complete  problem.  Hence  one  time  is  0  of 
a  polynomial  in  n  if  and  only  if  the  other  is. 
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The  style  of  a  page  number  indicates  the  nature  of  the  text  material: 

the  style    123    indicates  a  definition, 

the  style    123    indicates  an  extended  discussion,  and 

the  style    123    indicates  a  short  discussion  or  brief  mention. 


A 

accepting  state,  190. 
Adjacent  Comparisons  Theorem, 
241. 

admissible  flow,  1 72. 
alcohol,  334. 

Algebra,  Fundamental  Theorem  of, 

387. 
algorithm, 

analysis,  367  373. 

greedy,  79,  151. 

Kruskal's,  151. 

parallel,  239. 

pipelining,  240. 

Prim's,  151. 

recursive,  207,  195. 

running  time,  367-373. 

tree  traversal,  248-252. 
and  (logical),  59. 

articulation  point,  see  vertex,  cut. 
asymptotic  analysis,  307. 
asymptotic  to,  340. 
asymptotically  equal,  373. 
Augmentable  Path  Theorem,  173. 
automaton,  see  finite  automaton. 

B 

backtracking,  84-88. 

balance,  171. 

balls  in  boxes,  37,  40,  295. 

base  case,  198. 

bicomponent,  154. 

Big  oh  (0(  )  and  0+{ )),  368. 

bijcction,  43. 

binary  relation,  see  relation,  binary. 


binary  tree,  259. 

binomial  coefficient,  19-20,  77. 

Binomial  Theorem,  20,  273. 

Bonferroni's  Inequality,  102. 

breadth-first, 

edge  sequence,  249. 

order,  249. 

traversal,  249. 

vertex  sequence,  249. 
buckminsterfuUerene,  165. 
Burnside's  Lemma,  112-115,  330. 

c 

canonical,  51. 
capacity,  171,  172,  177. 
card  hands,  21-23,  68. 
Cartesian  product,  7. 
Catalan  numbers,  15-17,  30,  39, 

251,  279-280. 
characteristic  function,  101,  115. 
Chebyshev's  inequality,  288,  325, 

385. 

Chi  ixiS)),  101,  115. 

child,  139,  247. 

chromatic  number,  378  ,  379. 

chromatic  polynomial,  158-161. 

Church's  Thesis,  188. 

clique,  183. 

clique  number,  183. 

Codes,  error  correcting,  27-30. 

codomain,  42. 

coimagc,  54. 

comparator,  238. 

complement,  200. 

composition  of  an  integer,  10. 

composition,  45. 
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conjunctive  form,  203. 
connected  component,  133. 
convolution,  272. 
cut  partition,  177. 
cycle,  133. 

directed,  142. 

Hamiltonian,  136. 

index,  333. 

length,  147. 

of  a  permutation,  see 
permutation,  cycle. 

permutation,  46. 

D 

data  structure. 

graph  representation,  146. 

RP-tree,  264. 

tree  traversal,  249. 
decision,  66. 
degree  of  the  face,  164. 
DeMorgan's  law,  61. 
depth- first, 

edge  sequence,  248. 

postorder  traversal,  249. 

preorder  traversal,  249. 

traversal,  84,  248. 

vertex  sequence,  248. 
derangement,  50,  99,  283,  318. 
derivation,  255. 
descendant,  153. 
digraph,  142. 

enumerative  use,  310-312. 

finite  automaton  and,  189-192. 

functional,  142,  329,  356. 

simple,  142. 

strongly  connected,  144. 
direct  insertion  order,  71,  78-79. 
direct  product,  see  set,  direct 

product, 
disjunctive  form,  200. 
disjunctive  normal  form,  62. 
distinct  representative,  178-179. 
distribution,  uniform,  381. 
divide  and  conquer,  9,  220,  232. 
Dobinski's  formula,  320. 
domain,  42. 

domino  arrangement,  89,  310-312. 


E 

edge,  66,  122,  123. 

contraction,  159. 

cut,  134. 

deletion,  159. 

parallel,  122,  125. 

subgraph,  induced  by,  133. 
EGF,  316. 

elementary  event,  381. 
envelope  game,  42. 
equations,  number  of  solutions  to, 
98. 

equivalence  relation,  see  relation, 

equivalence. 
Error  correcting  code,  27-30. 
error,  relative,  12. 
Euler  characteristic,  162. 
Euler  phi  function,  100,  105. 
Euler's  relation,  163-164. 
Eulerian  trail,  145. 
event,  381. 

event,  elementary,  381. 
exclusive  or,  60. 

expectation  of  a  random  variable, 
384. 

exponential  formula,  321-326. 

F 

falling  factorial,  11. 

father,  139. 

Ferris  wheel,  112. 

Fibonacci  numbers,  34,  36,  224, 

275-276,  277. 
finite  automaton,  189-192. 

deterministic,  191. 

enumerative  use,  310-312. 

grammar  of,  191,  256. 

non-d=d,  191. 

nondeterministic,  191. 

regular  sequences  and,  304. 
finite  state  machine,  see  finite 

automaton, 
flow,  171. 

flows  in  network,  170-179. 
forest,  326. 

Four  Color  Theorem,  158. 
full  binary  tree,  259. 
function,  42. 

lx\  (floor),  26. 

\x]  (ceiling),  344. 
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function  {continued): 

bijective,  43. 

Boolean,  59,  200. 

characteristic,  101,  115. 

codomain  of,  42. 

coimage  of,  54. 

domain  of,  42. 

Euler  phi  (ip),  100,  105,  338. 

exp{z)  =  e^,  321. 

Gamma  (F),  349. 

generating,  see  generating 
function. 

graph,  incidence,  123. 

image  of,  54. 

incidence,  123. 

injective,  43. 

inverse,  43,  54. 

lex  order  listing,  76. 

monotone,  52. 

monotonic,  51-52. 

(^)  (binomial  coefScient),  19. 

nondecreasing,  51. 

one-line  notation  for,  42. 

one-to-one,  see  function, 
injective. 

onto,  see  function,  surjective. 

ordinary  generating,  269. 

rank,  see  rank. 

recursive,  188. 

restricted  growth,  58. 

restriction  of,  132. 

strictly  decreasing,  lex  order, 
77-78. 

surjective,  43,  97. 

two  line  notation  for,  43. 

unrank,  see  rank. 
Fundamental  Theorem  of  Algebra, 
387. 


generating  function,  19,  36,  269. 

exponential,  316. 

ordinary,  269. 
grammar, 

automaton  and,  191. 

context-free,  254. 

nonterminal  symbol,  254. 

phrase  structure,  256. 

production,  254. 

regular,  256. 

regular  and  finite  automaton,  256. 


grammar  (continued): 

terminal  symbol,  254. 
graph,  123. 

bicomponents  of,  154-155. 

biconnected,  154,  167. 

bipartite,  134. 

circuit,  135. 

clique  number,  183. 

coloring,  157-161,  372,  378. 

complete,  160. 

connected,  133. 

directed,  142. 

dual,  162. 

embedding,  149,  165. 

Eulerian,  135. 

Hamiltonian,  136. 

isomorphism,  128. 

matrix  representation  of,  146. 

multigraph,  124. 

oriented  simple,  144. 

planar,  149,  162-169. 

planar,  coloring,  165-166. 

regular,  164. 

rooted,  138. 

simple,  122. 

st-labeling,  168. 

unlabeled,  125,  129. 
Gray  code,  81-82,  86,  220. 
group,  112. 

alternating,  49. 

cyclic,  116,  333. 

dihedral,  116. 

permutation,  112. 

symmetric,  49. 

H 

hands  of  cards,  21-23,  68. 
Hanoi,  Tower  of,  214-216. 
heap,  236. 

Heawood's  Theorem,  165. 
height,  230. 

I 

identifier,  255. 
image,  54. 

Inclusion  and  Exclusion  Principle, 
94-104. 

independent  random  variables,  383. 
induction,  198-203,  361-365. 
induction  assumption,  198,  361. 
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induction  hypothesis,  198,  361. 
inductive  step,  198,  361. 
initial  condition,  see  recursion, 

initial  condition, 
injection,  43. 
input  symbol,  190. 
involution,  47. 
isomer,  334. 

isomorphism,  graph,  128. 
isthmus,  see  edge,  cut. 

K 

Kuratowski's  theorem,  162. 

L 

I'Hopital's  Rule,  352. 
Lagrange  inversion,  326. 
language,  254,  256. 
Language  Recognition  Problem, 
378. 

Latin  rectangle.  179. 
Latin  square,  88. 
leaf,  139,  0,  247. 
lex  order,  7. 

bucket  sort  and,  234. 

functions  listed  in,  76. 

listing  in,  66. 

permutations  listed  in,  70. 

strictly  decreasing  functions 
listed  in,  77. 
lexicographic  order,  see  lex  order, 
lineal  pair,  153. 
linear  ordering,  11. 

see  also  list, 
list,  5,  6. 

basic  counting  formulae,  40. 

circular,  13,  112. 

unordered,  51-52. 

with  repetition,  6  8. 

without  repetition,  11-12. 
listing, 

by  backtracking,  84-87. 

symmetry  invariant,  109. 
Little  oh  (o(  )),  373. 
local  description,  213. 
loop,  142. 


M 

machine,  finite  state,  see  finite 

automaton, 
map,  158. 

map  coloring,  165-166. 
map,  coloring,  158. 
Marriage  Problem,  Stable,  194. 
Marriage  Theorem,  178. 
matrix, 

graph  represented  by,  146. 

nilpotcnt,  147. 

permutation,  48. 
Max-Flow  Min-Cut  Theorem,  177. 
maximum  fiow,  1 72. 
mean  of  a  random  variable,  384. 
Monger's  theorem,  180. 
mergesort,  see  sort,  merge. 
Monte  Carlo  simulation,  264. 
multinomial  coefficient,  23-24,  31. 
Multinomial  Theorem,  31. 
multiplication,  polynomial,  343. 
multiset,  5,  36-37,  51-52. 

N 

necklace,  113. 
network  flow,  170-179. 
node,  see  vertex, 
nonterminal  symbol,  254. 
notation, 

one- line,  42. 

postorder,  253. 

reverse  Polish,  253. 

two-line,  43. 
NP-complete,  see  problem, 

NP-complete. 
n,  41. 
number, 

Catalan,  15-17,  30,  39,  251, 
279-280. 

chromatic,  378,  379. 

Fibonacci,  34,  36,  199,  204,  224, 
275-276,  277. 

partition  of,  37,  295. 

Stirling,  34,  324. 


Subject  Index  465 


0 

OGF,  269. 

Oh,  big  oh  (0(  )  and  0+(  )),  368. 

Oh,  httle  oh  (o( )),  373. 

onc-hnc  notation,  42. 

or  (logical),  59. 

or  (logical),  exclusive,  60. 

orbit,  330. 

order,  linear,  see  list, 
output  symbol,  190. 

P 

palindrome,  9. 

parallel  edges,  122. 

parent,  139. 

parsing,  255. 

partial  fraction,  276,  387. 

partial  order,  103. 

partition  of  a  number,  37,  295. 

partition  of  a  set,  see  set  partition. 

partition,  cut,  176-179. 

Pascal's  triangle,  32. 

path,  66,  131. 

augmentable,  173. 

directed,  142. 

increment  of  the,  1 73. 

length,  147. 
permutation,  43,  45-48. 

alternating,  327. 

cycle  form,  46. 

cycle  length,  46. 

derangement,  50. 

direct  insertion  order,  71,  78-79. 

involution,  47. 

lex  order  listing,  70. 

parity  of,  49. 

transposition  order,  72. 
Philip  Hall  Theorem,  178. 
Pigeonhole  Principle,  55-57,  58. 
pipelining,  240. 
Polya's  Theorem,  333-338. 
polynomial  multiplication,  343. 
polynomial,  chromatic,  158-161. 
popping,  215. 
postorder, 

edge  sequence.,  249. 

notation,  253. 

traversal,  249. 

vertex  sequence;,  249. 


preorder, 

edge  sequence;,  249. 

traversal,  249. 

vertex  sequence;,  249. 
principal  subtree,  247. 
probability  space,  381. 
problem, 

assignment,  178. 

bin  packing,  379. 

coloring,  378. 

halting,  188. 

intractable,  378. 

n  queens,  90. 

NP-complete,  377-379. 

NP-easy,  378. 

NP-hard,  378. 

scheduling,  158. 

tractable,  378. 
production,  254,  256. 
programming, 
Priifer  sequence,  141. 
pushing,  215. 

Q 

queens  problem,  90. 
queue,  249. 

R 

Ramsey  Theory,  201. 
random  variable,  382. 

expectation,  384. 

independence,  383. 

mean,  384. 

variance,  384. 
rank,  70. 

all  functions  in  lex  order,  77. 

formula  for,  76. 

permutations  in  direct  insertion 
order,  79. 

strictly  decreasing  functions  in 
lex  order,  78,  87. 

subsets  of  given  size,  78,  87. 
recursion,  32-35,  197,  224. 

constant  coefficient,  282-283. 

initial  condition,  32. 

linear,  341. 
recursive, 

approach,  204,  195. 

definition,  204. 

formula,  204. 
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recursive  (continued): 

solution,  204. 
regular  sequence,  304. 

counting,  296. 
relation, 

binary,  103,  127,  142. 

covering,  145. 

equivalence,  103,  111,  126-130, 
154. 

residual  tree  of,  75. 
reverse  Polish  notation,  253. 
root,  138,  247. 
rooted  map,  330. 
RP-tree,  139,  247. 

and  parsing,  255. 

binary,  248-266,  288-289. 

binary,  counting,  279-280. 

breadth-first  traversal,  249. 

depth-first  traversal,  248. 

number  of  leaf,  286-288. 

postordcr  traversal,  249. 

preorder  traversal,  249. 

restricted,  counting,  353-354. 

statistic,  264-265. 

unlabeled,  259  266. 
Rule  of  Product,  6,  9-10,  292-298, 

317-320. 
Rule  of  Sum,  8,  9-10,  292-298, 
317-320. 

s 

sample,  6. 

SDR  (system  of  distinct 

representatives),  178. 
selection,  6. 
sentence,  254. 
sequence,  6. 

sequence,  counting,  296,  304,  318. 
series, 

see  also  function,  generating, 
alternating,  99. 
bisection  of,  274. 
geometric,  271,  273. 
multisection  of,  274. 
set,  5,  19-23,  51-52. 
cut,  176  179. 
direct  product,  41. 
incomparable,  25. 
partially  ordered,  103. 


set  partition,  34,  54. 
block,  34. 

Dobinski's  formula,  320. 

partition,  block,  54. 
sibling,  139. 
sink,  172. 
son,  139,  247. 
sort, 

see  also  sorting  network, 
binary  insertion,  233. 
bucket,  234. 

comparisons,  lower  bound  on,  229. 
divide  and  conquer  approach,  221. 
Heapsort,  236. 
insertion,  233. 
merge,  206,  210-212,  235. 
merge,  time,  278. 
methods,  227-228. 
Quicksort,  235. 
Quicksort,  time,  289-290,  350. 
sorting  network,  238. 

Adjacent  Comparisons  Theorem, 
241. 

Batcher,  235,  239,  243-244,  308. 

brick  wall,  239,  244. 

Bubble,  239. 

Zero-One  Principle,  241. 
source,  172. 
space,  probability,  381. 
Sperner  family,  25. 
Stable  Marriage  Problem,  194. 
stack,  15,  215,  249. 
standard  deviation,  384. 
start  symbol,  254. 
starting  state,  189. 
state  transition  table,  189. 
Stirling  number, 

Stirling  numbers  of  the  first  kind, 
50,  324. 

Stirling  numbers  of  the  second  kind, 

34,  54,  97. 
Stirling's  formula  (n!),  12,  39. 
string,  6. 

string,  counting,  296,  304,  318. 
subgraph,  132. 

directed,  142. 

induced  by  vertex,  133. 
subset,  77,  82,  86,  87. 
sum,  rapidly  decreasing  terms,  345. 
sum,  slowly  decreasing  terms,  345. 
surjection,  43,  97. 
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syntax,  254. 

T 

Taylor's  Theorem,  267,  270. 
terminal  symbol,  254. 

theorems  (some), 

Adjacent  Comparisons  Theorem, 
241. 

Augmentablc  Path  Theorem,  173. 
Binomial  Theorem,  20. 
Bonferroni's  inequality,  102. 
Burnsidc's  Lemma,  112. 
Chebyshev's  Inequality,  385. 
convolution  formula,  272. 
Dobinski's  formula.  320. 
exponential  formula,  321. 
Five  Color  (Heawood's)  Theorem, 
165. 

Four  Color  Theorem,  158. 
Fundamental  Theorem  of 

Algebra,  387. 
Kuratowski's  Theorem,  162. 
Lagrange  inversion,  326. 
Marriage  (Philip  Hall)  Theorem, 

178. 

Max-Flow  Min-Cut  Theorem,  177. 

Multinomial  Theorem,  31. 

Pigeonhole  Principle,  55. 

Polya's  Theorem,  333. 

Principle  of  Inclusion  and 
Exclusion,  95. 

Ramsey's  Theorem,  202. 

Rule  of  Product,  6. 

Rule  of  Sum,  8. 

Stirling's  formula  (n!),  12. 

Taylor's  Theorem,  270. 

Vandcrmondc's  formula,  274. 

Zero-One  Principle,  241. 
Theta  (90),  368. 
Tower  of  Hanoi,  214-216. 
trail,  131. 

transposition  order,  72. 
Traveling  Salesman  Problem,  378. 
traversal,  248-252. 
tree,  66,  136,  139. 

binary,  140,  259. 

decision,  65. 

depth  first  spanning,  153. 
free,  136. 

full  binary,  140,  259. 


tree  {continued): 
heap,  236. 

height  of  rooted,  140. 

height  of,  230. 

labeled,  counting,  143-144. 

leaves  of,  66. 

lineal  spanning,  153. 

ordered  rooted,  66. 

parse,  255. 

postorder  traversal,  249. 
preorder  traversal,  249. 
root  of,  66. 

rooted  labeled,  counting,  326. 

rooted  plane,  see  RP-tree. 

rooted  unlabeled,  counting, 
354-355. 

spanning,  139.  150  154. 

spanning,  minimum  weight, 
150-154. 

traversal  algorithms,  84,  248-252. 
truth  table,  61. 
Turing  machine,  188. 
two-line  notation,  43. 

u 

uniform  distribution,  381. 
unrank,  see  rank. 

V 

value,  172. 

Vandermonde's  formula,  274. 

variable,  random,  382. 

variance  of  a  random  variable,  384. 

Venn  diagram,  96. 

vertex,  66,  122,  123. 

adjacent,  123. 

cut,  134. 

degree,  124,  126. 

isolated,  125,  132. 

loop,  125. 

sequence,  131. 

sink,  172. 

source,  172. 

terminal  (leaf),  140. 
VLSI  design,  169. 
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w 

walk,  131,  147. 
weakly  decreasing,  52. 
word,  6. 

z 


Zero-One  Principle,  241. 
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Section  1.1 

1.1.1.  We  can  form  n  digit  numbers  by  choosing  the  leftmost  digit  AND  choosing  the  next  digit 
AND  •  •  •  AND  choosing  the  rightmost  digit.  The  first  choice  can  be  made  in  9  ways  since  a  leading 
zero  is  not  allowed.  The  remaining  n  —  1  choices  can  each  be  made  in  10  ways.  By  the  Rule  of 
Product  we  have  9  x  10"~^. 

To  count  numbers  with  at  most  n  digits,  we  could  sum  up  9  x  10*^"^  for  1  <  fc  <  n.  The  sum 
can  be  evaluated  since  it  is  a  geometric  scries.  This  docs  not  include  the  number  0.  Whether  we 
add  1  to  include  it  depends  on  our  interpretation  of  the  problem's  requirement  that  there  be  no 
leading  zeroes.  There  is  an  easier  way.  We  can  pad  out  a  number  with  less  than  n  digits  by  adding 
leading  zeroes.  The  original  number  can  be  recovered  from  any  such  n  digit  number  by  stripping  off 
the  leading  zeroes.  Thus  we  see  by  the  Rule  of  Product  that  there  are  10"  numbers  with  at  most  n 
digits.  If  we  wish  to  rule  out  0  (which  pads  out  to  a  string  of  n  zeroes),  we  must  subtract  1. 

1.1.2.  The  only  possible  vowel  and  consonant  pattern  satisfying  the  two  nonadjacent  vowels  and 
initial  and  terminal  consonant  conditions  is  CVCVC.  By  the  Rule  of  Product,  there  are 

3x2x3x2x3  =  108  possibilities. 

1.1.3.  List  the  elements  of  the  set  in  any  order:  ai,  a2, . . . ,  a\s\-  We  can  construct  a  subset  by 

including  ai  or  not  AND 
including  a2  or  not  AND 

including  a\s\  or  not. 

Since  there  are  2  choices  in  each  case,  the  Rule  of  Product  gives  2  x  2  x  •  •  •  x  2  =  2l'^l. 

1.1.4  (a)  To  form  a  composition  of  n,  we  can  write  n  ones  in  a  row  and  insert  either  "+"  or  "," 
in  the  spaces  between  them.  This  is  a  series  of  2  choices  at  each  of  n  —  1  spaces,  so  we  obtain 
2"~^  compositions  of  n. 

(b)  Reversing  the  roles  in  a  compostion  with  k  parts  gives  a  composition  with  n  +  I  —  k  parts. 
Since  this  reversal  is  a  one-to-one  correspondence  between  compositions  of  n  and  the  average 
number  of  parts  in  the  two  corresponding  compostions  is  h+ittlizk^      g^j.g  done. 

1.1.5.  The  answers  are  SISITS  and  SISLAL.  We'll  come  back  to  this  type  of  problem  when  we  study 
decision  trees. 

Section  1.2 

1.2.1.  If  we  want  all  assignments  of  birthdays  to  people,  then  repeats  are  allowed  in  the  list  men- 
tioned in  the  hint.  This  gives  365^°.  If  we  want  all  birthdays  distinct,  no  repeats  are  allowed  in  the 
list.  This  gives  365  x  364  x  •  •  •  x  (365  —  29).  The  ratio  is  0.29.  How  can  this  be  computed?  There  are 
a  lot  of  possibilities.  Here  are  some. 

•  Use  a  symbolic  math  package. 

•  Write  a  computer  program. 

•  Use  a  calculator.  Overflow  may  be  a  problem,  so  you  might  write  the  ratio  as 

(365/365)  X  (364/365)  x  •  •  •  x  (336/365). 

•  Use  (1.2).  You  are  asked  to  do  this  in  the  next  problem.  Unfortunately,  there  is  no  guarantee 
how  large  the  error  will  be. 

•  Use  Stirling's  formula  after  writing  the  numerator  as  3651/335!.  Since  Stirling's  formula  has  an 
error  guarantee,  we  know  we  are  close  enough.  Computing  the  values  directly  from  Stirling's 
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formula  may  cause  overflow.  This  can  be  avoided  in  various  ways.  One  is  to  rearrange  the 
various  factors  by  using  some  algebra: 

^^^i^^^l^r'    _  =  V3657335  (365/335)33Ve30. 
^/2^f335(335/e)335(365)30       ^      '      ^     i      >  ' 

Another  way  is  to  compute  the  logarithm  of  Stirling's  formula  and  use  that  to  estimate  the 
logarithm  of  the  answer. 

1.2.2.  Wc  want  (n!/(n  —  fc)!)/n'^  when  n  =  365  and  k  =  30.  (See  the  solution  to  the  previous 
exercise.)  From  (1.2), 

g-fcV2n    provided  k  =  o{n'^/3). 


(n  —  fc)!  n*^ 

Since  we  need  k  =  0(71^^^)  and  365^/^  ~;  U)^/'^  50,  our  estimate  may  not  be  too  good.  The  estimate 
is  e-30^(2x365)  ^  g-900/730  ^  0.2915,  which  is  rather  close  to  the  correct  answer  of  0.2937. 

1.2.3.  Each  of  the  7  letters  ABMNRST  appears  once  and  each  of  the  letters  CIO  appears  twice. 
Thus  we  must  form  an  ordered  list  from  the  10  distinct  letters.  The  solutions  are 


k  =  2 

k  =  3 
A;  =  4 


10  X  9  =  90 

10  X  9  X  8  =  720 
10  x9x8x7  =  5040 


1.2.4.  This  can  be  done  in  many  ways.  Some  methods  lead  to  lots  of  cases  joined  by  OR  which  must 
be  added  by  the  Rule  of  Sum;  other  methods  lead  to  a  few  cases.  Here  is  one  of  the  simplest. 

For  fc  =  2,  the  letters  are  distinct  OR  equal.  By  the  previous  exercise,  there  are  90  distinct 
choices.  Since  the  only  repeated  letters  are  CIO,  there  are  3  ways  to  get  equal  letters.  This  gives  93. 

For  =  3,  we  have  either  all  distinct  OR  two  equal.  The  two-equal  case  can  be  worked  out  as 
follows: 

choose  the  repeated  letter  (3  ways)  AND 

choose  the  positions  for  the  two  copies  of  the  letter  (3  ways)  AND 

choose  the  remaining  letter  (10  —1  =  9  ways). 

By  the  previous  exercise  and  the  Rules  of  Sum  and  Product,  we  have  720  +  3x9x3  =  801. 

For  fc  =  4  either  all  four  letters  are  distinct  OR  there  are  just  three  distinct  letters  OR  there 
axe  just  two  distinct  letters.  There  are  5040  ways  to  choose  all  letters  distinct.  In  the  second  case,  we 
have  one  repeated  letter  and  two  distinct  letters.  Reasoning  as  for  A:  =  3,  we  get  3x6x9x8  =  1296. 
The  last  case  is  a  bit  trickier.  (We'll  get  into  problems  associated  with  distinguishing  pairs  when  we 
discuss  hands  of  cards  later.  For  now,  we'll  avoid  that  problem.)  We  can  proceed  as  follows: 
choose  the  first  letter  (3  ways)  AND 

choose  where  the  second  occurence  if  that  letter  is  (3  ways)  AND 
choose  the  other  letter  (2  ways). 

This  gives  18  for  a  grand  total  of  6354. 

1.2.5  (a)    Since  there  are  5  distinct  letters,  the  answer  is  5  x  4  x  3  =  60. 

(b)  Since  there  are  5  distinct  letters,  the  answer  is  5^  =  125. 

(c)  Either  the  letters  are  distinct  OR  one  letter  appears  twice  OR  one  letter  appears  three  times. 
We  have  seen  that  the  first  can  be  done  in  60  ways.  To  do  the  second,  choose  one  of  L  and  T 
to  repeat,  choose  one  of  the  remaining  4  different  letters  and  choose  where  that  letter  is  to  go, 
giving  2  X  4  X  3  =  24.  To  do  the  third,  use  T.  Thus,  the  answer  is  60  +  24  +  1  =  85. 
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1.2.6.  For  (a)  we  have  5  x  4  x  •  •  •  x  (6  —  A;).  For  (b)  we  have  5*^.  For  (c)  we  omit  the  details,  just 
noting 

For  k  =  1,  there  are  5.  For  k  =  2,  there  are  7.  For    =  4,  there  are  286. 

For  k  =  5,  there  are  820.         For  k  =  6,  there  are  1920.       For  k  =  7,  there  are  3360. 
For  k  =  8,  there  are  3360.       For  k  >  8,  there  are  none. 

1.2.7  (a)    push,  push,  pop,  pop,  push,  push,  pop,  push,  pop,  pop.  Remembering  to  start  with 
something,  say  a  on  the  stack:  {a{bc)){{de)f). 

(b)  This  is  almost  the  same  as  (a).  The  sequence  is  112211212122  and  the  last  "pop"  in  (a)  is 
replaced  by  "push,  pop,  pop." 

(c)  a{{b{{cd)e)){fg));    push,  push,  push,  pop,  push,  pop,  pop,  push,  push,  pop,  pop,  pop; 
111010011000. 

1.2.8.  If  we  remove  the  first  vote  (whic;h  must  be  for  the  first  candidate)  and  the  last  vote  (which 
must  be  for  the  second  candidate),  we  now  have  an  election  where  ties  are  allowed  but  each  candidate 
receives  only  n  —  1  votes.  Thus  the  answer  is  Cn-i- 

1.2.9.  Stripping  off  the  initial  R  and  terminal  F,  we  are  left  with  a  list  of  at  most  4  letters,  at  least 
one  of  which  is  an  L.  There  is  just  1  such  list  of  length  1.  There  are  3^  —  2^  =  5  lists  of  length  2, 
namely  all  those  made  from  E,  I  and  L  minus  those  made  from  just  E  and  I.  Similarly,  there  are 
33  -  2^  =  19  of  length  3  and  3^  -  2^  =  65.  This  gives  us  a  total  of  90. 

The  letters  used  are  E,  F,  I,  L  and  R  in  alphabetical  order.  To  get  the  word  before  RELIEF, 
note  that  we  cannot  change  just  the  F  and/or  the  E  to  produce  an  earlier  word.  Thus  we  must 
change  the  I  to  get  the  preceding  word.  The  first  candidate  in  alphabetical  order  is  F,  giving  us 
RELF.  Working  backwards  in  this  manner,  we  come  to  RELELF,  RELEIF,  RELEF  and,  finally, 
RELEEF. 

1.2.10.  If  there  are  4  letters  besides  R  and  F,  then  there  is  only  one  R  and  one  F,  for  a  total  of 
65  spellings  by  the  previous  problem.  If  there  are  3  letters  besides  R  and  F,  we  may  have  R-  •  -F, 
R-  •  -FF  or  RR-  •  -F,  which  gives  us  3  x  19  =  57  words  by  the  previous  problem.  We'll  say  there  are  3 
RF  patterns,  namely  RF,  RFF  and  RRF.  If  there  2  letters  besides  R  and  F,  there  are  6  RF  patterns, 
namely  the  three  just  Usted,  RFFF,  RRFF  and  RRRF.  This  gives  us  6  x  5  =  30  words.  Finally,  the 
last  case  has  the  6  RF  patterns  just  listed  as  well  as  RFFFF,  RRFFF,  RRRFF  and  RRRRF  for  a 
total  of  10  patterns.  This  give  us  10  words  since  the  one  remaining  letter  must  be  L.  Adding  up  all 
these  cases  gives  us  65  +  57  +  30  +  10  =  162  possible  spellings.  Incidentally,  there  is  a  simple  formula 
for  the  number  of  n  long  RF  patterns,  namely  n  —  1.  Thus  there  are 

l  +  2  +  ...  +  (n-l)  =  n(n -l)/2 

of  length  at  most  n.  This  gives  our  previous  counts  of  1,  3,  6  and  10.  The  spelling  five  before  RELIEF 
is  REILIF. 

1.2.11.  There  are  n!/(n  —  k)\  lists  of  length  k.  The  total  number  of  lists  (not  counting  the  empty 
list)  is 

n!  n\  n!     n!  _    ,  / 1      1  ^     \  _     1 V"^  1* 

(n^+  (n^^'"+  1!  +  0!   ~  ^  V.  ^  " '  ^  Jn^. )  ~  ""'^^i' 


Since  e  =       =  X^^q  1*/^')     follows  that  the  above  sum  is  close  to  e. 
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1.2.12  (a)    This  is  just  an  ordered  list,  of  which  there  are  n\. 

(b)  There  are  two  ways  to  convert  such  a  seating  into  one  of  the  type  considered  in  (a):  Seat  left 
to  right  or  seat  right  to  left.  If  the  answer  is  N,  this  means  that  N  x  2  =  n\  and  so  N  =  n!/2. 

(c)  Reading  along  one  side  and  then  the  other,  we  get  an  ordered  list  and  so  the  answer  is  n\. 

(d)  Now  we  can  order  each  side  separately  and  so  we  have  A''  x  2  x  2  =  n!.  Thus  N  =  n\/A. 

(e)  Now,  if  wc  switch  one  side  left  to  right,  we  must  do  the  same  to  the  other  side.  This  gives  us 

iV  X  2  =  n!  and  so  N  =  n\/2. 

1.2.13.  We  can  only  do  parts  (a)  and  (d)  at  present. 

(a)    A  person  can  run  for  one  of  k  offices  or  for  nothing,  giving  k+1  choices  per  person.  By  the 
Rule  of  Product  we  get  (fc  +  1)^. 

(d)   We  can  treat  each  office  separately.  There  are  2^  —  1  possible  slates  for  an  office:  any  subset  of 
the  set  of  candidates  except  the  empty  one.  By  the  Rule  of  Product  we  have  (2^  —  1)*^. 

1.2.14.  Suppose  we  have  a  circular  list  of  prime  length  n  and  that  the  lists  obtained  by  cutting  it 
at  two  different  positions  separated  by  k  look  the  same.  Let  the  list  be  ai,  02,  •  •  • ,  o-n-  Since  the  two 
cuts  look  the  same,  =  0,+^  for  all  i,  where  subscripts  that  exceed  n  are  reduced  to  the  range 
[l.  n]  by  subtracting  a  multiple  of  n.  Replacing  i  hy  i  +  k,  we  get  Oj+fc  =  aj+2fc,  and  ho  ai  =  ai+2k- 
Repeating  this  process,  we  get  that  at  =  ai+rnk  for  any  positive  integer  m.  A  result  from  number 
theory,  which  we  won't  prove,  states  that  when  n  is  a  prime  and  k  is  not  a  multiple  of  n,  there  is  an 
integer  r  such  that  rA:  —  1  is  a  multiple  of  n.  With  m  =  r,  we  see  that  Oi  =  Oj+i.  Thus,  if  two  cuts  of 
the  circular  permutation  give  the  same  linear  permutation,  all  the  a's  are  equal.  Thus  the  number 
of  permutations  of  length  n  in  which  the  symbols  are  not  all  the  same  is  n  times  the  number  of 
circular  permutations  of  the  same  sort.  With  two  letters  we  have  2"  —  2  such  permutations  and  so 
there  are  (2"  —  2)/n  such  circular  permutations  for  a  total  of  2  +  (2"  —  2)/n  circular  permutations, 
for  L  letters,  replace  2  by  L. 

Section  1.3 

1.3.1.  After  recognizing  that  k  =  nX  and  n  —  k  =  n(l  —  A),  it's  simply  a  matter  of  algebra. 

1.3.2.  Instead  of  considering  each  ballot  to  be  distinct,  we  can  equally  well  think  of  two  types  of 
votes,  namely  1  and  2  depending  on  who  was  voted  for.  Let  T„  be  the  number  of  ways  to  order 
the  votes  and  let  An  (resp.  B„)  be  the  number  of  ways  to  order  them  so  that  that  the  first  (resp. 
second)  was  never  behind.  The  answer  is  ^"^t'^" ,  where  we  need  to  evaluate  An,  Bn  Tn-  Since  the 
n  votes  for  the  first  candidate  could  be  in  any  of  the  2n  positions,  T„  =  (^^).  By  Example  1.12, 
An  =  Bn  =  Cn  =       (^") )  the  Catalan  numbers.  Thus  the  desired  probability  is 

1.3.3.  Choose  values  for  pairs  AND  choose  suits  for  the  lowest  value  pair  AND  choose  suits  for  the 
middle  value  pair  AND  choose  suits  for  the  highest  value  pair.  This  gives  (  g^)  (2)   =  61,  776. 

1.3.4.  If  we  try  to  keep  track  of  all  possible  orders  (such  as  card  1,  card  2,  card  paired  with  1, 
card  4,  card  paired  with  4),  we  will  have  a  lot  of  cases  to  consider.  There  is  an  easier  way.  A  5-card 
hand  can  be  ordered  in  5!  ways,  so  this  is  the  number  of  ways  it  could  be  dealt.  Thus  we  simply 

multiply  the  number  of  2  pair  hands  by  5!. 

1.3.5.  Choose  the  lowest  value  in  the  straight  (A  to  10)  AND  choose  a  suit  for  each  of  the  5  values 
in  the  straight.  This  gives  10  x  4^  =  10240. 

Although  the  previous  answer  is  acceptable,  a  poker  player  may  object  since  a  "straight  flush" 
is  better  than  a  straight — and  we  included  straight  flushes  in  our  count.  Since  a  straight  flush  is  a 
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straight  all  in  the  same  suit,  we  only  have  4  choices  of  suits  for  the  cards  instead  of  4^.  Thus,  there 
arc  10  X  4  =  40  straight  flushes.  Hence,  the  number  of  straights  which  are  not  straight  flushes  is 
10240  -  40  =  10200. 

1.3.6.  By  Exercise  1.1.4,  this  is  the  number  of  ways  to  insert  A;  —  1  commas  into  n  —  1  positions 

which  is  G:!). 

1.3.7.  This  is  like  Exercise  1.2.3,  but  we'll  do  it  a  bit  differently  Note  that  EXERCISES  contains 
3  E's,  2  S's  and  1  each  of  C,  I,  R  and  X.  By  the  end  of  Example  1.17,  wc  can  use  (1.4)  with  N  =  9, 
mi  =  3,  777-2  =  2  and  7713  =  7774  =  7775  =  7775  ~  1.  This  gives  9!/3!  2!  =  30240. 

It  can  also  be  done  without  the  use  of  a  multinomial  coefficient  as  follows.  Choose  3  of  the 
9  possible  positions  to  use  for  the  three  E's  AND  choose  2  of  the  6  remaining  positions  to  use  for 
the  two  S's  AND  put  a  permutation  of  the  remaining  4  letters  in  the  remaining  4  places.  This  gives 


The  number  of  eight  letter  arrangements  is  the  same.  To  see  this,  consider  a  9-list  with  the 
ninth  position  labeled  "unused." 

1.3.8.  An  arrangement  is  an  ordered  list  formed  from  13  things  each  used  4  times.  Thus  we  have 

TV  =  52  and  777i  =  4  for  1  <  i  <  13  in  (1.4). 

1.3.9.  Think  of  the  teams  as  labeled  and  suppose  Teams  1  and  2  each  contain  3  men.  We  can  divide 
the  men  up  in  (g  ^^^^  ^)  ways  and  the  women  in  (2  2  3^3  i)  ways. 

Wc  must  now  count  the  number  of  ways  to  form  the  ordered  situation  from  the  unordered  one. 
Be  careful — it's  not  4!  x  2  as  it  was  in  the  example!  Thinking  as  in  the  early  card  example,  we  start 
out  two  types  of  teams,  say  M  or  F  depending  on  which  sex  predominates  in  the  team.  We  also  have 
two  types  of  refrees.  Thus  we  have  two  M  teams,  two  F  teams,  and  one  each  of  an  F  referee  and  an 
M  referee.  We  can  order  the  two  M  teams  (2  ways)  and  the  two  F  teams  (2  ways),  so  there  are  only 
2x2  ways  to  order  and  so  the  answer  is  (,j   2^2 1)  j- 

1.3.10  (a)    LALALAL,  LALALAS,  LALALAT,  LALALIL. 

(b)  TSITSAT,  TSITSIL,  TSITSIS,  TSITSIT. 

(c)  LALSALS,  LALSALT,  LALSASL,  LALSAST. 

(d)  The  possible  consonant  vowel  patterns  are  CCVCCVC,  CCVCVCC,  CVCCVCC  and  CVCVCVC. 
The  first  three  each  contain  two  pairs  of  adjacent  consonants,  one  isolated  consonant  and  two 
vowels.  Thus  each  corresponds  to  (3  x  2)^  x  3  x  2^  names.  The  last  has  four  isolated  consonants 
and  three  vowels  and  so  corresponds  to  3''  x  2^  names.  In  total,  there  are  1944  names. 

1.3.11.  The  theorem  is  true  when  A:  =  2  by  the  binomial  theorem  with  x  =  jji  and  y  =  y2-  Suppose 
that  k  >  2  and  that  the  theorem  is  true  for  k  —  1.  Using  the  hint  and  the  binomial  theorem  with 
X  =  yk  and  y  =  t/i  +  2/2  H  h  yk-i,  we  have  that 


Thus  the  coefficient  of        •  •  •  2/^''  in  this  is  (^^)  =  n\/(n  —  mjt)!mfe!  times  the  coefficient  of 

vT^  . . .  y^^-^  in  (j/i  +  7/2  +  1-  yk~i)"~"^''  ■  When  77  —  777^  =  rrii  +  m2 -\  +  777fc_i  the  coefficient 

is  (n  —  r77fc)!/777i!r772!  •  •  •  ruk-il  and  otherwise  it  is  zero  by  the  induction  assumption.  Multiplying  by 
(mfc)'  ^®  obtain  the  theorem  for  k. 
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Values  of  k 

0     1     2     3     4  5 

V  0  pi  0  0  0  0  0 
a 

1  1  0  1  0  0  0  0 
u 

e  2  0  1  1  0  0  0 
s 

3      0     13     10  0 

o  \J.\i\i\i\i 

f     4     0     1     7     6     1  0 

n    5     0     1    15   25   10  1 


Figure  S.1.1    Stirling  numbers  of  the  second  kind. 

Section  1.4 

1.4.1.  The  rows  are  1,7,21,35,35,7,1  and  1,8,28,56,70,56,28,8,1. 

1.4.2.  The  recursion  makes  sense  for  fc  >  1  and  n  >  1  if  we  define 

5(0.  0)  =  1    and    Sij,  0)  =  S{0,j)  =  0    for    j  >  0. 

Other  starting  conditions  are  possible;  for  example,  S{j,  0)  =  0  for  j  >  0,  S{n,  n)  =  1  for  n  >  0  and 
the  recursion  making  sense  for  0  <  A;  <  n.  (In  the  latter  case,  it  is  understood  that  the  values  for 
k  >  n  are  0.)  Figure  S.1.1  shows  the  computation  of  the  values  through  n  =  5. 

1.4.3.  Let  i(n,  k)  be  the  number  of  ordered  fc-lists  without  repeats  that  can  be  made  from  an  n-set 
S.  Form  such  a  list  by  choosing  the  first  element  AND  then  forming  a.  k  —  1  long  list  using  the 
remaining  n  —  1  elements.  This  gives  L{n,  k)  =  nL[n  —  1,  fc  —  1). 

Single  out  one  item  x  €  S.  There  are  L(n—  1,  k)  lists  not  containing  x.  If  x  is  in  the  list,  it  can 
be  in  any  of  k  positions  AND  the  rest  of  the  list  can  be  constructed  in  L{n  —  1,  fc  —  1)  ways.  Thus 

L{n,k)  =  L{n-  l,k)  +  kL{n-  l,k-  1). 

1.4.4  (a)  This  can  be  proved  by  writing  the  binomial  coefficients  in  terms  of  factorials.  It  can  also 
be  proved  from  the  definition  of  the  binomial  coefficient:  Choosing  a  set  of  size  k  from  a  set  of 

size  n  is  equivalent  to  throwing  away  a  set  of  size  n  —  k,  namely  the  things  not  chosen. 

(b)  The  total  number  of  subsets  of  an  n  element  set  is  2".  On  the  other  hand,  we  can  divide  the 
subsets  into  collections  Tj,  where  Tj  contains  all  the  i  element  subsets.  The  number  of  subsets 
in  Ti  is  (") .  Apply  the  Rule  of  Sum. 

(c)  The  easiest  way  to  prove  this  is  to  take  the  generating  function  (1  +  a;)"  =  Yl  (2)^^  ^^'^  set 
X  =  -1. 

(d)  This  can  be  done  with  generating  functions  or  by  a  counting  argument.  For  the  former  approach, 
write  (1  +  x)"~^"^  =  (1  +  a;)"(l  +  .x)™,  expand  (1  +  x)"  and  (1  +  a;)™  by  the  binomial  theorem, 
multiply  the  results  and  equate  the  coefficients  of  a;'^  there  and  in  (l  +  a;)"+'".  For  the  counting 
argument,  consider  disjoint  sets  N  and  M  with  n  and  m  elements  respectively.  Choose  k 
elements  from  the  union  of  N  and  M.  The  left  side  counts  this  directly.  The  right  side  breaks 
this  up  according  to  how  many  of  the  k  elements  come  from  N. 
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1.4.5.  The  only  way  to  partition  an  n  element  set  into  n  blocks  is  to  put  each  element  in  a  block 
by  itself,  so  S{n,  n)  =  1.  The  only  way  to  partition  an  n  element  set  into  one  block  is  to  put  all  the 
elements  in  the  block,  so  S{n,  1)  =  1. 

The  only  way  to  partition  an  n  element  set  into  n  —  1  blocks  is  to  choose  two  elements  to  be  in 

a  block  together  an  put  the  remaining  n  —  2  elements  in  n  —  2  blocks  by  themselves.  Thus  it  suffices 
to  choose  the  2  elements  that  appear  in  a  block  together  and  so  S{n,  n  —  1)  =  (2). 

The  formula  for  5(n,  n  — 1)  can  also  be  proved  using  (1.8)  and  induction.  The  formula  is  correct 
for  n  =  1  since  there  is  no  way  to  partition  a  1-set  and  have  no  blocks.  Assume  true  for  n  —  1.  Use 
the  recurion,  the  formula  for  S{n  —  l,n  —  1)  and  the  induction  assumption  for  S{n  —  l,n  —  2)  to 
obtain 

S{n,n-1)  =  S{n-l,n-2)  +  {n  -  l)S{n  -  l,n  -  1)  =  (^~^^+  (n-  1)1  = 

which  completes  the  proof. 

Now  for  S{n,  2).  Note  that  S{n,  k)  is  the  number  of  unordered  lists  of  length  k  where  the  list 
entries  are  nonempty  subsets  of  a  given  n-set  and  each  element  of  the  set  appears  in  exactly  one 
list  entry.  We  will  count  ordered  lists,  which  is  k\  times  the  number  of  unordered  ones.  We  choose  a 
subset  for  the  first  block  (first  list  entry)  and  use  the  remaining  set  elements  for  the  second  block. 
Since  an  n-set  has  2",  this  would  seem  to  give  2"/2;  however,  we  must  avoid  empty  blocks.  In  the 
ordered  case,  there  are  two  ways  this  could  happen  since  either  the  first  or  second  list  entry  could 
be  the  empty  set.  Thus,  we  must  have  2"  —  2  instead  of  2". 

Here  is  another  way  to  compute  S{n,  2).  Look  at  the  block  containing  n.  Once  it  is  determined, 
the  entire  two  block  partition  is  determined.  The  block  one  of  the  2"~^  subsets  of  n  —  1  with  n 
adjoined.  Since  something  must  be  left  to  form  the  second  block,  the  subset  cannot  be  all  of  n  —  1. 
Thus  there  are  2^~^  —  1  ways  to  form  the  block  containing  n. 

The  formula  for  5(71,2)  can  also  be  proved  by  induction  using  the  recursion  for  S{n,k)  and 
the  fact  that  S{n,  1)  =  1,  much  as  was  done  for  S{n,n—  1). 

1.4.6.  One  approach,  which  we  might  call  "formal,"  is  to  look  at  the  recursion  and  see  what  works. 

We  take  a  second  approach,  which  we  might  call  "combinatorial"  or  "constructive."  In  this  approac;li, 
we  look  at  the  construction  and  ask  what  should  happen  when  A:  =  1.  Since  fc  =  1,  there  is  only  one 
block,  and  removing  it  should  leave  nothing.  Thus,  the  only  term  we  want  is  the  one  with  j  =  n  and 
this  should  give  us  1  since  S{n,  1)  =  1.  To  achieve  this  we  want  5(0,0)  =  1  and  S{n,0)  =  0  when 

1.4.7.  There  are  (^)  ways  to  choose  the  subset  AND  k  ways  to  choose  an  element  in  it  to  mark. 
This  gives  the  left  side  of  the  recursion  times  k.  On  the  other  hand,  there  are  n  ways  to  choose 
an  element  to  mark  from  {1,2,  ...,n}  AND  (^Zj)  ways  to  choose  the  remaining  elements  of  the 
fc-element  subset. 

1.4.8  (a)  We  use  the  hint.  Choose  i  elements  of  {1,  2,  •  •  • ,  n}  to  be  in  the  block  with  n  +  1  AND 
either  do  nothing  else  if  i  =  n  OR  partition  the  remaining  elements.  This  gives  (^)  if  i  =  n 
and  otherwise.  If  we  set  Bq  =  1,  the  second  formula  applies  for  i  =  n,  too.  Since  i  =  0 
OR  i  =  1  OR  •  •  •  OR  i  =  n,  the  result  follows. 

(b)    We  have  Bq  =  I  from  (a).  Using  the  formula  in  (a)  for  n  =  0,1,2,3,4  in  order,  we  obtain 

=  1,  B2  =  2,  B3  =  5,  B4  =  15  and  B5  =  52. 

1.4.9  (b)  Each  office  is  associated  with  a  nonempty  subset  of  the  people  and  each  person  must  be 
in  exactly  one  subset.  This  is  a  partition  of  the  set  of  candidates  with  each  block  corresponding 
to  an  office.  Thus  we  have  an  ordered  partition  of  a  n  element  set  into  k  blocks.  The  answer  is 
k\S{n,k). 
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(c)   This  is  like  the  previous  part,  except  that  some  people  may  be  missing.  We  use  two  methods. 

First,  let  i  people  run  for  no  offices.  The  remaining  n  —  i  can  be  partitioned  in  S{n  —  i,k)  ways 
and  the  blocks  ordered  in  k\  ways.  Thus  we  get  J2i>o  {T)^^-'^^'"'  ~  second  method, 

either  everyone  runs  for  an  office,  giving  k\S{n,  k)  or  some  people  do  not  run.  In  the  latter 
case,  we  can  think  of  a  partition  with  fc  +  1  labeled  blocks  where  the  labels  are  the  k  offices 
and  "not  running."  This  give  {k  +  l)!S'(n,  k+1).  Thus  we  have  k\S{n,  k)  +  {k+  l)!5'(n.  A;  +  1). 
The  last  formula  is  preferable  since  it  is  easier  to  calculate  from  tables  of  Stirling  numbers. 

(e)  Let  T(p,  k)  be  the  number  of  solutions.  Look  at  all  the  people  running  for  the  first  fc  —  1  offices. 
Let  t  be  the  number  of  these  people.  If  t  <  p,  then  at  least  p  —  t  people  must  be  running 
for  the  fcth  office  since  everyone  must  run  for  some  office.  In  addition,  any  of  these  t  people 
could  run  for  the  fcth  office.  By  the  Rule  of  Product,  the  number  of  ways  we  can  have  this 
particular  set  of  t  people  running  for  the  first  fc  —  1  offices  and  some  people  running  for  the  fcth 
office  is  T{t,k  —  1)2*.  The  set  of  t  people  can  be  chosen  in  (^)  ways.  Finally,  look  at  the  case 
t  =  p.  In  this  case  everyone  is  running  for  one  of  the  first  fc  —  1  offices.  The  only  restriction  we 
must  impose  is  that  a  nonempty  set  of  candidates  must  run  for  the  fcth  office.  Putting  all  this 
together,  we  obtain 

T{p,k)  =  1^  QT(t,  fc  -  1)2* +T(p,fc-1) (2^-1). 

This  recursion  is  valid  for  p>2  and  fc  >  2.  The  initial  conditions  are  T{p,  1)  =  1  for  p  >  0  and 
T(l,fc)  =  1  for  fc  >  0. 

Notice  that  if  "people"  and  "offices"  are  interchanged,  the  problem  is  not  changed.  Thus 
T{p,  fc)  =  T{k,p)  and  a  recursion  could  have  been  obtained  by  looking  at  offices  that  the  first 
p  —  1  people  run  for.  This  would  give  us 

T{p,k)  =  J2QT{p-l,t)2*  +  T{p-l,k){2''-l). 

1.4.10.  When  we  speak  of  a  sequence  in  this  exercise,  we  mean  a  sequence  having  no  adjacent  zeroes 
whose  entries  are  from  {0,l,...,d— 1}.A  sequence  of  length  n  can  be  built  from  a  sequence  of  length 
n  —  1  by  adding  something  other  than  zero  OR  it  can  be  built  from  a  sequence  of  length  n  —  1  that 
doesn't  end  in  zero  by  adding  zero.  (To  see  this,  simply  note  what  is  left  when  you  remove  the  last 
digit  from  a  sequence  of  length  n.)  The  sequences  of  length  n  —  1  not  ending  in  zero  can  all  be  built 
by  adding  something  other  than  zero  to  a  sequence  of  length  n  — 2.  Putting  this  all  together  by  using 
the  Rules  of  Sum  and  Product  and  noting  that  there  are  d  —  1  choices  for  a  digit  which  is  not  zero, 
we  get  An  =  {d—  1)  +  {d  —  1)  A„_2.  Since  we  refer  to  sequences  of  length  n  —  2,  we  require  that 
n  >  3.  The  initial  values  are  Ai  =  d  and  A2  =  d^  —  1  since  the  2-sequences  consist  of  everything 
except  0,0.  (If  we  define  Aq  =  1,  we  could  start  the  recursion  at  n  =  2.)  When  d  =  10,  the  values  of 
AiioT  l<i<5  are  10,  99,  981,  9720  and  96309. 

Section  1.5 

1.5.1.  For  each  element,  there  are  j  + 1  choices  for  the  number  of  repetitions,  namely  anything  from 
0  to  j,  inclusive.  By  the  Rule  of  Product,  we  obtain  (j  +  l)l'^l. 

1.5.2.  When  j  —  1,  we  are  simply  talking  about  subsets  so  the  answer  is  ('f  )•  Since  a  fc-multiset 
contains  fc  things,  when  j  >  fc,  there  is  no  restriction  on  repetition.  Thus  we  are  simply  counting 
unrestricted  multisets  of  which  there  are  (''^'\*"^)  =  C'^si^T^)- 
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1.5.3.  To  form  an  unordered  list  of  length  k  with  repeats  from  {1,2,...,  n},  either  form  a  list  with- 
out n  OR  form  a  list  with  n.  The  first  can  be  done  in  M(n  —  l,k)  ways.  The  second  can  be  done 
by  forming  a,  k  —  1  element  list  AND  then  adjoining  n  to  it.  This  can  be  done  in  M(n,  /c  —  1)  x  1 
ways.  Initial  conditions:  M(n,  0)  =  1  for  n  >  0  and  M(0,  fc)  =  0  for  A;  >  0. 

1.5.4.  Let  the  elements  of  the  set  be  "place  a  ball  in  box  i"  for  1  <  i  <  n.  Select  k  elements  and  do 
what  they  say.  Clearly  each  placement  arises  exactly  once  this  way. 

1.5.5.  Interpret  the  points  between  the  ith  and  the  (i  +  l)st  vertical  bars  as  the  balls  in  box  i.  Since 
there  are  n+1  bars,  there  are  n  boxes.  Since  there  are  {n  +  k  —  1)  —  {n  —  1)  =  k  points,  there  are 
k  balls. 

1.5.6.  As  in  the  text,  consider  a  term  in  the  sum  obtained  by  expanding 

{l  +  xi  +  xia;i)(l  +  X2  +  X2X2)  •  •  •  (1  +  a;„  +  x„a;„) 

using  the  distributive  law.  The  number  of  times  Xi  appears  in  the  term  is  the  number  of  balls  in 
box  i.  Thus  box  i  never  has  more  than  two  balls.  The  total  number  of  all  kinds  of  x^'s  in  the  term 
is  the  number  of  balls.  Replacing  all  a;^'s  with  x's  makes  the  number  of  x^s  the  number  of  balls  in 
that  particular  placement.  Suppose  each  box  can  contain  up  to  j  balls.  Instead  of  1  +  a;  +  x"^,  we 
now  have  1  +  x  +  x^  +  \-  x^. 

1.5.7.  This  exercise  and  the  previous  one  are  simply  two  different  ways  of  looking  at  the  same  thing 

since  an  unordered  list  with  repetitions  allowed  is  the  same  as  a  multiset.  The  nth  item  must  appear 
zero,  one  OR  two  times.  The  remaining  n  —  1  items  must  be  used  to  form  a  list  of  length  k,  k  —  1 
or  fc  —  2  respectively.  This  gives  the  three  terms  on  the  left.  We  generalize  to  the  case  where  each 
item  is  used  at  most  j  times:  T(n,  fc)  =  X^^^q  T{n  —  l,k  —  i). 

1.5.8.  We  will  induct  on  n  >  0  and,  for  each  value  of  n,  we  will  induct  on  fc  >  0.  It  is  easily  verified 
that  M(l,  fc)  =  1  for  all  fc  >  0  and  M{n,  1)  =  1  for  all  n  >  0  by  the  definition  of  R.  This  agrees  with 
("^fe~^)-  Thus  we  can  assume  that  n  >  1  and  fc  >  1  for  the  induction  step.  Prom  Exercise  1.5.3, 

M(n,fc)  =  M(n-l,fc)+M(n,fc-l), 

and  so  M{n,  fc)  =  ('"^^1+''"^)  +  {''^^k-i'^)  by  the  induction  hypothesis.  The  formula 

it)  +  (fe-i)  =  completes  the  proof. 

1.5.9  (a)    We  give  two  solutions.  Both  use  the  idea  of  inserting  a  ball  into  a  tube  in  an  arbitrary 
position.  To  physically  do  this  may  require  some  manipulation  of  balls  already  in  the  tube. 

1.  Insert  6—1  balls  into  the  tubes  AND  then  insert  the  6"^  ball.  There  arc  i  +  1  possible 
places  to  insert  this  ball  in  a  tube  containing  i  balls.  Summing  this  over  all  t  tubes  gives 
us  {b  —  1)  + 1  possible  places  to  insert  the  6*''  ball.  We  have  proved  that 

f{b,t)  =  f{b-l,t){b  +  t-l). 

Since  /(l,t)  =  t,  we  can  establish  the  formula  by  induction. 

2.  Alternatively,  we  can  insert  the  first  ball  AND  insert  the  remaining  b  —  1  balls.  The  first 
ball  has  the  effect  of  dividing  the  tube  in  which  it  is  placed  into  two  tubes:  the  part  above 
it  and  the  part  below.  Thus 

f{b,t)  =  tf{b-l,t+l), 


and  we  can  again  use  induction. 
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(b)   We  give  two  solutions: 

Construct  a  list  of  length  t  +  b  —  1  containing  each  ball  exactly  once  and  containing  t  —  1 
copies  of  "between  tubes."  This  can  be  done  in  ways — choose  the  "between  tubes" 

and  then  permute  the  balls  to  place  them  in  the  remaining  b  positions  in  the  list. 

Alternatively,  imagine  an  ordered  b  +  t—l  long  list.  Choose  t  —  l  positions  to  be  divisions 
between  tubes  AND  choose  how  to  place  the  b  balls  in  the  remaining  b  positions.  This  gives 
CI!-)  X  6!. 

1.5.10  (a)   /(n,  k)  =  fin  -  l,k  -  1)  +  (n  +  k  -  l)/(n  -  1,  k). 

(b)  If  the  order  of  the  blocks  also  mattered,  the  number  of  solutions  would  be  f{n,  k)  k\.  On  the 
other  hand,  we  can  obtain  such  partitions  by  constructing  an  ordered  list  of  the  n  things  and 
then  choosing  fc  —  1  places  between  them  (without  replacement)  to  be  the  divisions  between 
boxes.  This  gives  us  a  count  of  n\  (^Z^). 

Section  2.1 

2.1.2  (a)   Since  /  is  an  injection,  every  element  of  A  maps  to  a  different  element  of  B.  Thus  B  must 

have  at  least  as  many  elements  as  A. 

(b)  Since  /  is  a  surjection,  every  element  of  B  is  the  image  of  at  least  one  element  of  A.  Thus  A 

must  have  at  least  as  many  elements  as  B. 

(c)  Combine  the  two  previous  results. 

(d)  Suppose  that  /  is  an  injection  and  not  a  surjection.  Then  there  is  some  b  G  B  which  is  not 
the  image  of  any  element  of  A  under  /.  Hence  /  is  an  injection  from  ^  to  i?  —  {b}.  By  (a), 
1^1  <  \B-{b}\  <  \B\,  contradicting  \A\  =  \B\. 

Now  suppose  that  /  is  a  surjection  and  not  an  injection.  Then  there  are  a,  a'  &  A  such  that 
/(a)  =  f{a').  Consider  the  function  /  with  domain  restricted  to  A— {a'}.  It  is  still  a  surjection 
to  B  and  so  by  (b)  \B\  <  \A-  {a'}\  <  \A\  ,  contradicting      =  \B\. 

(e)  By  the  previous  part,  if  /  is  either  an  injection  or  a  surjection,  then  it  is  both,  which  is  the 
definition  of  a  bijection. 

Section  2.2 

2.2.2.  Imagine  writing  the  permutation  in  cycle  form.  Look  at  the  cycle  containing  1,  starting  with 
1.  There  are  n  —  1  choices  for  the  second  element  of  the  cycle  AND  then  n  —  2  choices  for  the  third 
element  AND  •  •  •  AND  (n  —  k  +  1)  choices  for  the  kth  element. 

(a)  The  answer  is  given  by  the  Rule  of  Product  and  the  above  result  with  k  =  n. 

(b)  We  write  the  cycle  containing  1  in  cycle  form  as  above  AND  then  permute  the  remaining  n  —  k 
elements  of  n  in  any  fashion.  For  the  k  long  cycle  containing  1,  the  above  result  gives 
choices.  There  are  (n  —  fc)!  permutations  on  a  set  of  size  n  —  k.  Putting  this  all  together  using 
the  Rule  of  Product,  we  get  (n  —  1)!,  a  result  which  does  not  depend  on  k. 

(c)  Since  each  permutation  has  probability  1/n!,  this  follows  immediately  from  the  previous  parts. 

2.2.3.  The  interchanges  can  be  written  as  (1,3),  (1,4)  and  (2,3).  Thus  the  entire  set  gives  1  ^  3  ^  2, 
2^3,  3^1^4  and  4  — »  1.  In  cycle  form  this  is  (1,2,3,4).  Thus  five  applications  takes  1  to  2. 
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2.2.4.  We  look  at  Pfc(n  +  1).  Suppose  n  +  1  is  in  a  cycle  with  j  other  elements.  Then  we  must  have 

0  <  j  <  k.  The  j  elements  can  be  chosen  in  (")  ways,  AND  the  j  +  1  elements  can  be  formed  into  a 
(j  +  l)-cycle  in  j!  ways,  AND  the  remaining  n  —  j  elements  of  n  +  1  can  be  arranged  into  cycles  in 
Pk{n  —  j)  ways.  By  the  Rules  of  Sum  and  Product 

Pfe(n+1)  =  ^['^V-!P,(n-j). 

We  claim  this  is  valid  for  n  >  0.  To  see  this,  note  two  things:  First,  if  the  term  j  =  n  occurs,  there 
are  no  n  —  j  elements  to  arrange  and  so  we  should  not  have  the  factor  Pfc(O).  Since  Pk{0)  =  1,  that 
accomplishes  the  same  thing  as  dropping  the  factor  (0) .  Second  any  terms  with  j  >  n  should  be 
removed  since  this  is  impossible.  When  j  >  n,  (")  =  0  and  so  those  terms  are  all  zero. 

2.2.5  (a)  This  was  done  in  Exercise  2.2.2,  but  we'll  redo  it.  If  f{k)  =  k,  then  the  elements  of  n—  {k} 
can  be  permuted  in  any  fashion.  This  can  be  done  in  (n  —  1)!.  Since  there  are  n!  permutations, 
the  probability  that  f{k)  =  k  is  (n  —  l)!/n!  =  1/n.  Hence  the  probability  that  f{k)  ^  k  is 
1  -  1/n. 

(b)  By  the  independence  assumption,  the  probability  that  there  are  no  fixed  points  is  (1  —  1/n)". 
One  of  the  standard  results  in  calculus  is  that  this  approaches  1/e  as  n  — >  oo.  (You  can  prove 
it  by  writing  (1  —  1/n)"  =  exp(ln(l  —  l/n)/(l/n)),  setting  1/n  =  x  and  using  I'Hopital's  Rule.) 

(c)  Choose  the  k  fixed  points  AND  construct  a  derangement  of  the  remaining  n—k.  This  gives  us 
{^Dn-k-  Now  use  Dn-k  «  (n  -  k)\/e. 

2.2.6  (a)  To  c;onstruct  a  permutation  with  k  cycles,  break  it  into  cases  according  to  the  number  i  of 
elements  in  the  cycle  with  1.  For  each  such  case,  choose  those  i  elements  AND  then  construct 
a  cycle  containing  them  and  1  AND  construct  a  permutation  of  the  remaining  n  —  i  elements 
that  contains  exactly  k—1  cycles. 

(b)  It  is  valid  for  all  positive  n  and  k  if  we  set  z{Q,  0)  =  1  and  set  z{Q,  j)  =  z{j,  0)  =  0  for  j  ^  0. 

(c)  The  rows  correspond  to  values  of  n  and  the  columns  to  values  of  k. 


1 

2 

:] 

4 

5 

1 

1 

0 

0 

0 

0 

2 

1 

1 

0 

0 

0 

3 

2 

3 

1 

0 

0 

4 

6 

11 

6 

1 

0 

5 

24 

50 

35 

10 

1 

2.2.7.  For  1  <  fc  <  n  —  1,  E(|afc  —  ak+i\)  =  E(|z  —  j|),  where  the  latter  expectation  is  taken  over  all 
i  ^  j  in  n.  Thus  the  answer  is  (n  —  1)  times  the  average  of  the  n(n  —  1)  values  of  |i  —  j|  and  so 

answer  =  — tt  E  I-?' ~  *l  =  ~T — TT  E  l^^' ~  *l  =  ~     E    0'"^)'        proving  (a) 

j=l i=l  j=l   ^  '  j=l 

1  /n(n+ l)(2n+ 1)  _  n(n+ 1)\   _  n^  -  1 
n\  6  2      )  ~  ^  ■ 
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Section  2.3 

2.3.2  (a)  The  coimage  of  a  function  is  a  partition  of  the  domain  with  one  block  for  each  element 
of  Image(/). 

(b)  You  can  argue  this  dir(H;tly  or  apply  the  previous  result.  In  the  latter  case,  note  that  since 
Coimage(/)  is  a  partition  of  A,  \  Coiniage(/)|  =  \A\  if  and  only  if  each  block  of  Coimage(/) 
contains  just  one  element.  On  the  other  hand,  /  is  an  injection  if  and  only  if  no  two  elements 
of  A  belong  to  the  same  block  of  Coimage(/). 

(c)  By  the  first  part,  this  says  that  |Image(/)|  =  \B\.  Since  Image(/)  is  a  subset  of  B,  it  must 
equal  B. 

2.3.3.  We  can  form  the  permutations  of  the  desired  type  by  first  constructing  a  partition  of  n 

counted  by  B(n,  b)  AND  then  forming  a  cycle  from  each  block  of  the  partition.  The  argument  used 
in  Exercise  2.2.2  proves  that  there  are  {k  —  1)!  cycles  of  length  k  that  can  be  made  from  a  fc-set. 

2.3.4.  The  sequence  must  be  ai  <  •  •  •  <  >  •  •  •  >  a2k-i  for  some  fc  >  0.  Let  =  t.  Then  there 
are  possibilities  for  ai,...,  Uk-i-  The  same  answer  holds  for  ak+i,  •  •  • ,  ciik-i-  Thus  we  have 

i:f:a:iy^i:i:G:r 

t=i  fe=i  ^     ^      t=i  fe=i  ^ 


Using  results  from  Exercise  1.4.4  (p.  32),  we  can  simplify  this: 


t-lY       lAft-lV  ft-l\  f  t-l   \  f2t-2 


k=l 

and  so  the  answer  is 


i      \t-l-i  \t-l 


2t  -  2\       ^4  /2z 


2.3.5  (a)    In  the  order  given,  they  are  2,  1,  3  and  4 

(b)  If  /  is  associated  with  a  B  partition  of  n,  then  B  is  the  coimage  of  /  and  so  /  determines  B. 

(c)  See  (b). 

(d)  The  first  is  not  since  /(I)  =  2^1. 

The  second  is:  just  check  the  conditions. 

The  third  is  not  since /(4)  -  1  =  2  >  max(/(l), /(2), /(3))  =  1. 
The  fourth  is:  just  check  the  conditions. 

(e)  In  a  way,  this  is  obvious,  but  it  is  tedious  to  write  out  a  proof.  By  definition  /(I)  =  1.  Choose 
A;  >  1  such  that  f{x)  =  k  for  some  x.  Let  y  be  the  least  element  of  n  for  which  f{y)  =  k.  By 
the  way  /  is  constructed,  y  is  not  in  the  same  block  with  any  t  <  y.  Thus  y  is  the  smallest 
element  in  its  block  and  so  /(y)  will  be  the  smallest  number  exceeding  all  the  values  that  have 
been  assigned  for  f{t)  with  t  <y.  Thus  the  maximum  of  f{t)  over  t  <y  \s  k  —  \  and  so  /  is  a 
restricted  growth  function. 

(f)  The  functions  are  given  in  one-line  form  and  the  partition  below  them 

1111  1112  1121  1122  1123 

{1,2,3,4}  {1,2, 3}  {5}         {1,2, 4}  {3}         {1,2}  {3, 4}         {1,2}  {3}  {4} 

1211  1212  1213  1221  1222 

{1,3,4}{2}         {1,3}{2,4}        {1,3}{2}{4}        {1,4}{2,3}  {1}{2,3,4} 


1223  1231  1232  1233  1234 

{1}{2,3}{4}       {1,4}  {2}  {3}       {1}{2,4}{3}       {1}{2}{3,4}  {1}{2}{3}{4} 
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2.3.6.  If  no  block  contained  more  than  (l^l  —  l)/k  elements,  the  number  of  elements  would  be  at 
most 

\S\-1 


k 

which  cannot  be. 


k  =  \S\-l  <  \S\, 


2.3.7.  The  coimage  is  a  partition  of  A  into  at  most  \B\  blocks,  so  our  bound  is  1  +  {\A\  —  1)/\B\. 

2.3.8.  There  are  only  n  possible  values  of  m,  so  some  value  must  occur  more  than  ((n+ 1)  —  l)/n  =  1 
times.  Given  two  numbers  with  the  same  m,  one  divides  the  other. 

2.3.9.  li  s  <  t  and  f{s)  =  f{t),  that  tells  us  that  wc  cannot  put  a.,  at  the  start  of  the  longest 
decreasing  subsequence  starting  with  at  to  obtain  a  decreasing  subsequence.  (If  we  could,  we'd  have 
f{s)  >  f{t)  +  1.)  Thus,  Us  >  at-  Hence  the  subsequence  ai,aj,...  constructed  in  the  problem  is 
increasing. 

Now  we're  ready  to  start  the  proof.  If  there  is  a  decreasing  subsequence  of  length  n  +  1  we  are 
done.  If  there  is  no  such  subsequence,  /  :  ^  — >  n.  By  the  generalized  Pigeonhole  Principle,  there  is 
sum  k  such  that  f{t)  =  k  for  at  least  i/n  values  of  t.  Thus  it  suffices  to  have  i/n  >  m.  In  other 

words  £  >  mn. 

2.3.10.  Suppose  S  contains  p  pairs  of  integers.  If  p  >  N,  then  two  sums  must  have  the  same 
remainder  when  divided  by  N  since  only  N  remainders  are  possible. 

If  the  numbers  in  a  pair  must  be  distinct,  there  are  (2)  pairs  and  so  we  need  TV  <  (*)  =  . 
Solving  this  quadratic  inequality,  we  obtain  t  >  ^+^'^+^^.  ov  t  <  lui/I+l^.  The  latter  solution 
makes  no  sense  since  we  know  t  >  0. 

If  the  pair  may  contain  two  copies  of  the  same  number,  there  are  (2)  +  *  =  *^*2  pairs  and  we 
obtain  t  >  -i+f+S^. 

2.3.11.  Let  the  elements  be  si, . . . ,  s„,  let  to  =  0  and  let  =  Si  +  ...  +  Sj  for  1  <  i  <  n.  By  the 
Pigeonhole  Principle,  two  of  the  t's  have  the  same  remainder  on  division  by  n,  say  tj  and  tk  with 
j  <  k.  It  follows  that  tk  —  tj  =  Sj+i  +  .  • .  +     is  a  multiple  of  n. 


Section  2.4 

2.4.1.  x{x  +  y)  =  XX  +  xy  —  X  +  xy  =  X. 

2.4.2.  Here  is  a  truth  table  proof. 


X 

y 

x' 

y' 

x'  ®y' 

0 

0 

1 

1 

0 

0 

0 

1 

1 

0 

1 

1 

1 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

0 

2.4.3.  We  state  the  laws  and  whether  they  are  true  or  false.  If  false  we  give  a  counterexample. 

(a)  X  +  (yz)  =  {x  +  y){x  +  z)  is  true.  (Proved  in  text.) 

(b)  x{y  ®  z)  —  {xy)  ®  {xz)  is  true. 

(c)  x+  {y  ®  z)  ~  {x  +  y)  ®  {x  +  z)  is  false  with  x  =  y  =  z  =  1. 

(d)  X  ®  {yz)  =  {x  (B  y){x  (B  z)  is  false  with  x  =  y  =  \,  z  =  Q. 
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2.4.4.  It  suffices  to  show  that  a  NAND  can  be  written  using  NORs.  We  have  u'  =  NOR(u)  and 


2.4.5.  We  use  algebraic  manipulation.  Each  step  involves  a  simple  formula,  which  we  will  not  bother 
to  mention.  You  could  also  write  down  the  truth  table,  read  off  a  disjunctive  normal  form  and  try 

to  reduce  the  number  of  terms. 

(a)  (x  (B  y){x  +  y)  =  {xy'  +  x'y) {x  +  y)  =  xy'  +  x'yx  +  xy'y  +  x'y  =  xy'  +  x'y.  Note  that  this 
is  X  ®  y. 

(b)  {x  +  y)®z  =  {x  +  y)z' +  {x  +  y)' z  =  xz'  +  yz'  +  x'y' z. 

(c)  {x  +  y  +  z)(Bz  =  {x  +  y  +  z)z' +  {x  +  y  +  zYz  =  xz' +  yz' +  x'y' z' z  =  xz'  +  yz'. 

(d)  {xy)  (B  z  =  xyz'  +  [xy)' z  —  xyz'  +  x' z  +  y' z. 

2.4.6.  This  is  the  function  c  in  Figure  2.1. 

2.4.7.  There  are  many  possible  answers.  A  complicated  one  comes  directly  from  the  truth  table 
and  contains  8  terms.  The  simplest  form  is  xw  +  yw  +  zw  +  xyz.  This  can  be  obtained  as  follows. 

{x+y+z)'w  will  give  the  correct  answer  except  when  x  =  y  =  z  =  1  and  w  =  0.  Thus  we  could  simply 
add  the  term  xyzw' .  By  noting  that  it  is  okay  to  add  xyz  when  w  =  1,  we  obtain  {x-\-y  +  z)w  +  xyz. 

2.4.8.  There  are  many  possible  answers.  If  we  note  that  zw  gives  the  correct  answer  unless  zw  =  0, 
X  ^  y  and  at  least  one  of  z  and  w  is  one,  we  obtain 


Section  3.1 

3.1.1.  From  the  figures  in  the  text,  we  see  that  they  are  123,  132  and  321. 

3.1.2.  We  will  not  draw  the  tree. 

(a)  8  and  19. 

(b)  1432  and  3241. 

3.1.3.  We  will  not  draw  the  tree.  The  root  is  1,  the  vertices  on  the  next  level  are  21  and  12  (left  to 
right).  On  the  next  level,  321,  231,  213,  312,  132,  and  123.  Finally,  the  leaves  are  4321,  3421,  3241, 
3214,  4231,  2431,  2341,  2314,  4213,  2413,  2143,  2134,  and  so  on. 

(a)  7  and  16. 

(b)  2,4,3,1  and  3,1,2,4. 

3.1.4.  We  will  not  draw  the  tree.  The  root  is  1,  the  vertices  on  the  next  level  are  21  and  12  (left  to 
right).  On  the  next  level,  312,  231,  213,  321,  132,  and  123  Finally  the  leaves  are  4123,  3421,  3142, 
3124,  4312,  2413,  2341,  2314,  4132,  2431,  2143,  2134  4213,  3412,  3241,  3214,  4321,  1423,  1342,  1324, 
4231,  1432,  1243,  1234. 

(a)  7  and  8. 

(b)  2413  and  3214. 

3.1.5.  We  will  not  draw  the  tree.  There  are  nine  sequences:  ABABAB,  ABABBA,  ABBABA, 
ABBABB,  BABABA,  BABABB,  BABBAB,  BBABAB  and  BBABBA. 


zw  +  {xy' +  x' y){z  +  w)  =  zw  +  xy' z  +  xy'w  +  x'yz  +  x'yw. 
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3.1.6.  We  will  not  draw  the  tree. 

(a)  3  and  10. 

(b)  4,3,2,1  and  6,4,3,1. 

(c)  6,5,4,3  has  rank  14. 

(e)   The  decision  tree  corresponds  to  4  element  subsets  of  6.  The  leaf  5431  corresponds  to  the  subset 

{5,4,3,1}. 

3.1.7.  Wc  will  not  draw  the  tree. 

(a)  5  and  18. 

(b)  111  and  433. 

(c)  4,4,4  has  rank  19. 

(e)  The  decision  tree  for  the  strictly  decreasing  functions  is  interspersed.  To  find  it,  discard  the 
leftmost  branch  leading  out  of  each  vertex  except  the  root  and  then  discard  those  decisions 
that  no  longer  lead  to  a  leaf  of  the  original  tree. 

3.1.8.  Wc  don't  want  to  draw  the  tree  since  we  saw  that  it  had  648  leaves  in  Section  1.1.  It  is 
somewhat  irregular.  Usually  it  has  4  choices  after  a  consonant  and  3  after  a  vowel.  This  is  not 
always  true  though.  In  summary,  the  tree  does  not  appear  to  have  a  very  nice  structure. 

3.1.9.  We  assume  that  you  are  looking  at  decision  trees  in  the  following  discussion. 

(a)  The  permutation  of  rank  0  is  the  leftmost  one  in  the  tree  and  so  each  element  is  inserted  as 
far  to  the  left  as  possible.  Thus  the  answer  is  n,  (n  —  1), . . . ,  2, 1. 

The  permutation  of  rank  n!  —  1  is  the  rightmost  one  in  the  tree  and  so  each  element  is 
inserted  as  far  to  the  right  as  possible.  Thus  the  answer  is  1, 2, 3, . . . ,  n. 

We  now  look  at  n!/2.  Note  that  the  decision  about  where  to  insert  2  splits  the  tree  into 
two  equal  pieces.  We  axe  interested  in  the  leftmost  leaf  of  the  righthand  piece.  The  righthand 
piece  means  we  take  the  branch  1, 2.  To  stay  to  the  left  after  that,  3  through  n  are  inserted  in 
the  leftmost  position.  Thus  the  permutation  is  n,  (n  —  1), . . . ,  4, 3, 1, 2. 

(b)  The  permutation  of  rank  0  is  the  leftmost  one  in  the  tree  and  so  each  element  is  inserted  as 
far  to  the  left  as  possible.  It  begins  2,1.  Then  3  "bumps"  2  to  the  end:  3,1,2.  Next  4  "bumps" 
3  to  the  end:  4,1,2,3.  In  general,  we  have  n,  1, 2, 3, . . . ,  (n  —  1). 

The  permutation  of  rank  n!  —  1  is  the  rightmost  one  in  the  tree  and  so  each  element  is 
inserted  as  far  to  the  right  as  possible.  Thus  the  answer  is  1,2,3,...,  n. 

We  now  look  at  n!/2.  Note  that  the  decision  about  where  to  insert  2  splits  the  tree  into 
two  equal  pieces.  We  are  interested  in  the  leftmost  leaf  of  the  righthand  piece.  The  righthand 
piece  means  we  take  the  branch  1,  2.  To  stay  to  the  left  after  that,  3  through  n  are  inserted  in 
the  leftmost  position.  This  leads  to  "bumping"  as  it  did  for  rank  0.  Thus  the  permutation  is 
n,2,l,  3,4,5,. ..,(n-l). 

(c)  You  should  be  able  to  see  that  the  permutation  (1,  2,  3, . . . ,  n)  has  rank  0  in  both  cases  and 
that  the  permutation  (n, . . . ,  3,  2, 1)  has  rank  n!  —  1  in  both  cases. 

First  suppose  that  n  =  2m,  an  even  number.  It  is  easy  to  see  how  to  split  the  tree  in  half 
based  on  the  first  decision  as  we  did  for  insertion  order:  Choose  m+1  and  then  stay  as  left  as 
possible.  This  means  everything  is  in  order  except  for  m  +  1.  Thus  the  permutation  is  m  +  1 
followed  by  the  elements  of  n  —  {m  +  1}  in  ascending  order. 

Now  suppose  that  n  =  2m  —  1.  In  this  case,  we  must  make  the  middle  choice,  m  and 
split  the  remaining  tree  in  half,  going  to  the  leftmost  leaf  of  the  right  part.  If  you  look  at  some 
trees,  you  should  see  that  this  leads  to  the  permutation  m,  m  +  1  followed  by  the  elements  of 
n  —  {m,  m  +  1}  in  ascending  order. 
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N  NY 

10(^)9(^)  9(^)8(^)  9(^)2 

Figure  S.3.1  The  decision  tree  for  forming  three  fuU  houses.  The  numbers  at  each  vertex  in  the  k^^  level 
are  the  number  of  ways  to  form  the  k^^  hand.  Going  from  the  root  to  a  leaf  involves  two  ANDs.  The  total 
number  of  possibilities  is  about  1.8  x  10^''. 


3.1.10.  By  Example  1.14  (p.  18),  the  first  full  house  can  be  formed  in  3,744  ways.  For  the  second 

full  house,  we  must  decide  whether  or  not  its  pair  has  the  same  value  as  the  pair  in  the  first  full 
house.  (This  is  such  a  simple  situation  that  we  don't  really  need  a  decision  tree.)  The  number  of 
choices  for  the  second  hand  is 


=  2,640  +  44  =  2,684 


and  the  two  hands  together  can  be  formed  in  10,048,896  ways.  If  the  order  of  the  hands  does  not 
matter,  this  should  be  divided  by  2. 

3.1.11  (a)   We'll  make  a  decision  based  on  whether  or  not  the  pair  in  the  full  house  has  the  same 
face  value  as  a  pair  in  the  second  hand.  If  it  does  not,  there  are 

(2)  (2)'^''"'"'^  =  '''''' 
possible  second  hands.  If  it  does,  there  are 

llQ^(52-8-3)  =  2,706 

possible  second  hands.  Adding  these  up  and  multiplying  by  the  number  of  possible  full  houses 
(79,926)  gives  us  about  3  x  10^  hands. 

(b)  There  are  various  ways  to  do  this.  The  decision  trees  are  all  more  complicated  than  in  the 

previous  part. 

(c)  The  order  in  which  things  are  done  can  be  very  important. 

3.1.12.  As  usual,  we'll  form  the  hands  sequentially.  The  first  decision  will  be  whether  or  not  the 
first  and  second  hands  have  pairs  with  the  same  face  value.  The  second  decision  will  be  whether  or 
not  the  pair  in  the  third  hand  has  a  face  value  that  is  the  same  as  an  earlier  pair.  We  obtain  the 
tree  in  Figure  S.3.1. 
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3.1.13.  You  can  simply  modify  the  decision  tree  in  Figure  3.5  as  follows:  Decrease  the  "number  of 

singles"  values  by  1  (since  the  desired  word  is  one  letter  shorter).  Throw  away  those  that  become 
negative;  i.e.,  erase  leaves  C  and  H.  Add  a  new  path  that  has  no  triples,  one  pair  and  five  singles. 
Call  the  new  leaf  X.  It  is  then  neccessary  to  recompute  the  numbers.  Here  are  the  results,  which 
total  to  113,540: 

,oAiA5M2, 1, 1, 1, 1, 1, 


=  12,600 


-  G)G)G)G..^O-- 


i)  (2)  (oy  V3, 2, 2 


2,520 


O:  I     -1=  560. 


2J  \0J \1J  V3,  3,  1 


Section  3.2 

3.2.1.  We  use  the  rank  formula  in  the  text  and,  for  unranking,  a  greedy  algorithm. 

(a)  (^3°)  +  (2)  +  (?)  =  133.       0  +  il)  +  il)  +  (?)  =  81. 

(b)  We  have  35  =  (J)  so  the  first  answer  is  8,3,2,1.  The  second  answer  is  12,9,6,5  because 

( 4 )  <  400  <  400  -        =  70 

Q  <  70    <  il)  70 -il)  =  14 

©  <   14    <  Q  U-il)  =  4 

(?)  <    4     <  (?). 

(c)  9,6,4,2,1  and  9,7,2,1. 

(d)  9,5,4,3,2  and  9,6,5,3. 

3.2.2.  We  use  the  rank  formula  in  the  text  and,  for  unranking,  a  greedy  algorithm. 

(a)  635124:  The  decisions  are  0,2,0,3,5  and  so  the  rank  is 

0  X  6!/2!  +  2  X  6!/3!  +  0  X  6!/4!  +  3  X  6!/5!  +  5  X  6!/6!  =  263. 

4,5,6,1,2,3:  The  decisions  are  0,0,3,3,3  and  the  rank  is  111. 

(b)  For  rank  151  we  have  151/(6!/2!)  is  0  remainder  151,  151/(6!/3!)  is  1  remainder  31,  31/(6!/4!) 
is  1  remainder  1,  l/(6!/5!)  is  0  remainder  1  and  1/(61/6!)  is  1  remainder  0.  Thus  the  decision 
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sequence  is  0,1,1,0,1  and  the  permutation  is  1,3,4,2,6,5. 

For  rank  300,  this  procedure  gives  decision  sequence  0,2,2,0,0  and  permutation  3,4,1,2,5,6. 

(c)  The  answers  are  1,2,3,4,6,5,7,8,9  and  6,5,4,1,2,3,7,9,8. 

(d)  The  answers  are  8,9,7,1,2,3,4,5,6  and  9,8,7,5,6,4,1,2,3. 

3.2.3.  One  can  compute  the  ranks  by  looking  at  the  decision  tree  or  by  using  the  formula  in  Theo- 
rem 3.3.  We  choose  the  latter  approach.  In  case  (j),  we  have  /(«)  =  k  +  j  —  i.  (This  is  easily  checked 
since  this  /  clearly  decreases  by  1  as  i  increases  by  1  and  it  gives  /(I)  =  k,  k  +  1  and  k  +  2  for  j  =  1, 
2  and  3,  respectively.)  By  the  theorem. 

When  j  =  1,  all  the  binomial  coefficients  are  0  and  so  the  answer  for  the  first  function  is  0. 
When  j  =  2,  all  the  binomial  coefficients  are  1  and  so  the  answer  for  the  second  function  is  k. 
When  j  =  3,  we  have 

RANK(/)  =  +  =  E(^  +  2-^)  =  {k  +  l)  +  {k)  +  {k-l)  +  ---  +  {2). 

i=l  ^  ^  i=l 

Since  the  sum  of  the  first  n  positive  integers  is  "^"^"^"^^ ,  the  rank  is  _  i  = 

3.2.4  (a)    We  give  two  proofs. 

First  proof.  Since  the  nonincreasing  functions  correspond  to  choosing  unordered  sam- 
ples without  repetition,  the  number  of  them  in  n-  is  Other  than  this  change  from  (^), 

the  method  for  proving  Theorem  3.3  works  and  so  we  have  RANK(/)  =  X^JL^  ('''^'fc-i+i"^)- 

Second  proof.  By  Example  2.11  (p.  48),  there  is  a  bijection  between  strictly  decreas- 
ing and  nonincreasing  functions.  The  bijection  ip  in  that  example  preserves  the  lex  order  of 
functions  and  hence  their  rank.  Apply  the  bijection  to  /  and  then  use  Theorem  3.3. 

(b)  5,5,4,2,1,1:  Q  +  Q  +  il)  +  ©  +  il)  +  0  =  156. 
6,3,3:  Q  +  il)  +  (^)  =  40. 

(c)  As  with  strictly  decreasing  functions,  we  find  that  35  =  (J)  -|-  (3)  -|-  (2)  -t-  (°)  and  that 
400  =  (")  +  (3)  +  (2)  +  (1).  Thus  the  functions  are  5,1,1,1  and  9,7,5,5. 

3.2.5  (a)    A  X  (n  -  1)!  +       X  (n  -  2)!  +  •  •  •  +  I?„_i  x  1!  =  J2lZl  Dk{n  -  k)\. 

(b)  Denote  the  permutation  by  /.  Let  L  ^  n.  For  i  =  1,  2, . . . ,  n  —  1  in  order:  let  Di  is  the  number 
of  elements  in  L  which  are  less  than  f{i)  and  replace  L  with  L  —  {f{i)}. 

(c)  The  decision  sequences  are  4,4,0,1,1  and  5,1,2,0,0  and  so  the  ranks  are  579  and  636. 

(d)  By  a  greedy  algorithm  we  get  the  decision  sequences  1,1,1,0,1  and  2,2,2,0,0.  The  permutations 

are  2,3,4,1,6,5  and  3,4,5,1,2,6. 

3.2.8.  4;  14;  25;  67;  102. 

3.2.9.  00000000000000000000  =  0^°;    11000000000000000000  =  l^O^^    0100;  10101100. 
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3 

4 

choice  for  (2,2)  entry 

4 

1 

4 

1 

choice  for  (2,3)  entry 

3 

1 

3 

choice  for  (2,4)  entry 

4 

1 

4 

1 

choice  for  (3,2)  entry 

1  2 

2 

1 

2 

2  4 

choice  for  (3,3)  entry 

2  1 

2 

2 

choice  for  (3,4)  entry 

Figure  S.3.2    The  decision  tree  for  4  x  4  standard  Latin  Squares  in  Exercise  3.3.1. 


3.2.10.  If  we  can  see  some  sort  of  pattern,  we  might  figure  out  what  to  do.  Let's  make  a  hst  of  the 
interchanges.  A  k  means  that  positions  k  and  k+1  are  interchanged.  Thus  going  from  the  first  to  the 
second  permutation  in  the  list  has  the  interchange  3.  The  hst  of  interchanges  is  3,  2,  1,  3,  1,  2,  3,  1 
repeated  three  times,  where  the  last  interchange  goes  from  the  last  element  (2,1,3,4)  to  the  first 
(1,2,3,4). 

That  doesn't  seem  to  help.  Can  we  see  anything  else?  Notice  how  the  4  moves  stcp-by-step 
from  right  to  left,  pauses,  moves  step-by-step  from  left  to  right,  pauses  (this  is  the  interchange  from 
the  bottom  of  one  column  to  the  top  of  the  next),  and  then  repeats  the  pattern  twice.  Let's  list  the 
interchanges  when  4  pauses,  dropping  4  from  the  list: 

1,2,3    1,3,2    3,1,2    3,2,1    2,3,1  2,1,3. 

It  turns  out  this  is  the  key:  1,  2  and  3  go  through  all  their  possible  permutations  and  4  simply  moves 
through!  We  can  do  this  with  n  =  5: 

1.  Start  with  1,2,3,4,5 

2.  Move  5  step-by-step  to  the  left  so  it  is  in  the  first  position,  say  5,  ai,  02,  03,  04. 

3.  The  we  look  in  the  list  for  n  =  4  to  find  what  follows  ai,  02,  as,  04,  say  61, 62,  ^3,  ^4- 

4.  After  5,01,02,03,04,  put  5,61,62,63,64. 

5.  Move  5  step- by-step  to  the  rightt  so  it  is  in  the  last  position,  say  ci,  C2,  C3,  C4,  5. 

6.  The  we  look  in  the  list  for  n  =  4  to  find  what  follows  Ci,  C2,  C3,  C4,  say  di,  d^,  d^,  d^. 

7.  After  ci,  C2,  C3,  C4, 5,  put  di,  d2,  ds,  d^,  5. 

8.  Go  to  Step  2. 

You  should  see  how  to  generalize  this. 

Section  3.3 

3.3.1.  When  building  and  n  x  n  Latin  Square,  if  the  first  n  —  1  rows  have  been  filled  in,  then  the 
last  row  is  determined.  Thus  we'll  omit  it  from  the  decision  tree.  The  tree  is  shown  in  Figure  S.3.2. 

3.3.2.  The  work  in  drawing  the  decision  tree  can  be  reduced  by  assuming  that,  possibly  by  rotating 
and  flipping  the  board,  the  queen  in  the  first  row  is  as  far  to  the  right  as  possible.  We  will  simply 
list  the  solutions  in  the  form  ci, . . . ,  c„  which  means  there  is  a  queen  in  square  {i,  c,)  for  1  <  i  <  n. 

•  n  =  4  has  the  two  solutions  2,4,1,3  and  3,1,4,2. 
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•  n  =  5  has  10  solutions.  Eight  come  from  3,5,2,4,1  and  its  rotations  and  reflections.  The  other 
two  are  4,1,3,5,2  and  its  reflection. 

•  n  =  6  has  four  solutions:  3,6,2,5,1,4;    4,1,5,2,6,3;    2,4,6,1,3,5  and  5,3,1,6,4,2. 

•  n  =  7  has  40  solutions. 

•  n  =  8  has  92  solutions. 

3.3.3.  You  should  find  14  solutions. 

3.3.4.  You  should  find  8  solutions.  They  arc  all  obtained  by  rotating  and  fiipping  the  solution  that 
has  the  L-shape  in  positions  (1,1),  (1,2)  and  (2,2).  The  positions  of  the  3  dominoes  are  then  forced. 

Section  4.1 

4.1.1.  The  Venn  diagrams  each  consist  of  two  intersecting  circles. 

(a)    V2  n  V3  contains  words  of  the  form  CVVC.  We  are  interested  in  V2  U  V3,  the  union  of  the 
circles.  Thus 


(b)    We  want  all  4  letter  words  beginning  and  ending  with  consonants  that  are  not  in  C2  fl  C3, 
which  is  212  X       -  21^. 

4.1.2.  These  are  simply  derangements:  Label  each  married  pair.  Assign  each  person  the  same  label 
as  given  to  the  pair.  Let  f{k)  be  the  label  of  the  man  who  is  paired  with  the  woman  whose  label  is 
k.  This  function  is  a  derangement  if  and  only  if  nobody  is  paired  with  his  spouse. 

4.1.3  (a)    If  everyone  who  lost  an  eye  also  lost  an  arm,  a  leg  and  an  ear,  then  there  would  be  70 

people  who  lost  all  four. 

(b)   Let  A  be  the  set  of  people  who  lost  an  arm  and  L  the  set  who  lost  a  leg.  How  small  can  Ar\L 
be?  We  have 


We  can  now  look  at  the  set  Z)  =  A  fl  i  of  double  amputees  and  ask  now  many  must  have  lost 
an  eye.  As  above,  we  have 


where  /  is  the  set  of  people  who  have  lost  an  eye.  Finally,  we  combine  these  people  with  the 
75  who  have  lost  an  ear  to  conclude  that  at  least  35  +  75  —  100  =  10  must  have  lost  all  four. 
Thus  p  >  10.  We  can  achieve  this  by  insisting  that  everyone  lost  at  least  three  things.  If  the 
people  are  numbered  1-100,  we  can  do  it  as  follows: 


\V2lJV3\ 


=    |V'2|  +  |V^3|-  |"^"2nF3| 

=  21^  X  5  X  26  +  21^  X  5  X  26  -  21^  x  5^ 
=  21^  X  5  X  47 


\Ar\L\  =  \A\  +  \L\-\A[JL\  =  165-|^UL|  >  165-  100  =  65. 


\DnI\  =       +  |/|  -  |£>U/|  >  65  +  70-  100  =  35, 


lost  arm: 
lost  leg: 
lost  eye 
lost  ear: 


1-65  and  81-100 
1  35  and  66  100 
1-10  and  36-100 


1-80 
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4.1.4.  If  a  hand  has  some  set  of  i  properties,  it  means  that  the  corresponding  set  of  i  suits  is  not 

present  in  the  hand.  Thus,  such  a  hand  is  formed  from  a  set  of  (4  —  *  13  cards  without  repetition. 
This  can  be  done  in  (^%^^')  ways.  Thus  Ni  =  (■)(^%"')-       (4.3),  the  answer  is 


4.1.5  (a)  A  number  x  has  a  factor  in  common  with  A'^  if  and  only  if  it  is  divisible  by  one  of  the 
primes  that  divide  N.  Thus  an  element  of  N_  has  no  factor  in  common  with  N  if  and  only  if  it 
is  in  none  of  the  sets  Sk- 

(b)  The  intersection  on  the  left  side  is  the  set  of  x  £  N_  that  are  multiples  of  b  =  pi^  -  ■  ■     .  These 
are  b,  26,  36  . . .,  {N/b)b.  Thus  the  set  has  N/b  elements,  as  was  to  be  proved. 

(c)  By  (4.3)  and  the  previous  result,  we  have 

Replacing  Xi  by  —l/pi  in  Example  1.13,  we  obtain  the  desired  result. 

4.1.6  (a)  This  can  be  done  in  various  ways.  One  way  is  to  permute  the  rows  and  columns  of  A  so 
that  the  indices  in  K  now  appear  as  the  top  \K\  rows  and  leftmost  \K\  columns.  Everything 

must  be  zero  except  the  (n  —  \K\)  x  (n  —  \K\)  matrix  in  the  lower  right  corner,  which  can 
be  anything.  This  matrix  contains  (n  —  |Ar|)^  entries  each  of  which  can  be  zero  or  one  so  the 
number  of  such  matrices  is  2("~l^l'  by  the  Rule  of  Product. 

(b)   We  use  the  Principle  of  Inclusion  and  Exclusion.  Let  Si  be  the  set  of  matrices  with  the  ith  row 
and  ith  column  consisting  entirely  of  zeroes.  Then 

l^ii  n  •  •  •  n  5iJ  =  z{K)    where    K  = 

Thus  Nr  =  {^)zr  and  so  the  answer  is 

r=0  ^   ^  r=0  ^ 

4.1.7.  Let  Si  be  those  lists  in  which  Cj  is  adjacent  to  Ci.  Consider  a  list  in  Si^  (1  ■■■  (1  Si^.  Using  the 

hint,  this  can  be  thought  of  as  a  list  made  from  2m  —  r  symbols,  where  for  the  present  we  regard  the 
two  occurrences  of  the  symbol  Cj  as  different  Since  the  list  is  a  rearrangement  of  the  symbols,  there 
are  (2m  —  r)!  such  lists.  However,  m  —  r  pairs  of  the  symbols  are  identical  and  we  have  treated  them 
as  different.  There  are  2"^~^  ways  to  treat  such  symbols  as  different.  Thus      =  (™)(2m  — r)!/2'"~''. 

4.1.8.  There  does  not  seem  to  be  a  simple  formula.  Let  t  be  the  number  of  Sf,  p  the  number  of  Sj 
for  which  Sj  is  also  required  and  p'  the  number  of  other  Sj's.  Then 

^^=E(r,.)0<»»-^'-''''i- 

where  the  sum  ranges  over  all  values  of  t,  p  and  p'  such  that  t  +  p  +  p'  =  r. 

4.1.9.  The  proof  is  practically  the  same  as  that  given  for  Theorem  4.1.  Instead  of  asking  how  much 
s  &  S  contributes  to  the  sums,  ask  how  much  Pr(s)  contributes. 
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4.1.10.  Suppose  s  G  S  lies  in  exactly  j  of  the  Si.  (Note  that  j  <  m.)  It  contributes  to  exactly  (^) 
of  the  sum 

Nr  =  ^|5i,n---n5iJ. 

This  is  correct  even  when  r  >  j  and  (^)  =0.  Thus  s  contributes 

to  (4.16).  We  need  to  show  that  the  sum  is  zero  unless  j  =  k.  When  j  <  fc  all  terms  are  zero.  When 
j  =  k,  only  the  i  =  0  term  is  nonzero  and  it  equals  1.  Suppose  k  <  j  <  m.  The  terms  in  the  sum  are 
zero  because  of  (^.^J  whenever  i  >  j  —  k.  Since  j  <  m,  we  can  replace  the  upper  limit  of  the  sum 
with  j  —  k.  Note  that 

\   i   J\k  +  iJ         i\k\    {k  +  iy.{j-k-iy.       k\  {j  -  k)\      -  k  -  i)\       \k)\   i  J' 
Thus  we  can  rewrite  the  sum  as 

|'-"'©C'^')  =  (0 

4.1.11  (a)  Let  the  notation  be  as  in  the  proof  of  the  Principle  of  Inclusion  and  Exclusion.  The  proof 
given  in  the  text  is  easily  adjusted  to  prove  s  contributes  exactly  ct-i{X)  to  Y^iI^{—'^ySi. 
Thus  the  sum  will  be  a  lower  bound  when  t  is  even  and  an  upper  bound  when  t  is  odd.  Including 
the  term  (— l)*S't  in  the  sum  changes  upper  bounds  to  lower  bounds  and  vice  versa  since  we 
are  now  considering  Ct{X).  By  considering  the  cases  of  t  even  and  t  odd  separately,  it  is  easy 
to  see  that  the  inequalities  follow. 

(b)    This  can  be  proved  by  induction  on  t  using  (If  I)  -  ('^i"^)  +  ('^["^)- 

4.1.12.  Since  IS']  =  A^O)  we  have  l^i  U  •  •  •  U  5^1  =  Nq  —  E.  Thus  Bonferroni's  inequalities  give  us 

t-i 

-Nt  <  -\SiU---USm\-Y.{-iyNr  <  Nt 

and  so 

t-i 

-Nt  <  \SiU---USm\-Y.{-iy-^Nr  <  Nt. 

4.1.13  (a)   Let  m  =  2.  Initially  the  N  array  contains 

2:     N2        1  :     TVi        0  :  Nq. 
With  j  =  0,  we  do  i  =  1  and  then  i  =  0.  The  N  array  now  contains 

2:     N2       1:     N1-N2       0:     A/'o  -  (iVi  -  iV2). 
With  j  =  1,  we  obtain 

2  :     N2       1  :     {Ni-N2)-N2       0  :     Nq  -  (A^i  -  N2). 
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Equation  (4.16)  gives 

E2  =  N2       El  ^  Ni-  2N2       Eo  =  Nq-  Ni  +  N2, 

which  agrees  with  the  values  computed  by  the  algorithm.  You  can  carry  out  similar  calculations 
for  m  =  3. 

(b)  This  can  be  done  by  carefully  carrying  out  the  steps  in  the  algorithm. 

(c)  After  no  iterations  (that  is,  at  the  start  of  the  algorithm),  Nr  contains  s  as  many  times  as  there 
is  set  of  r  indices  for  which  (4.17)  is  true.  If  s  appears  in  exactly  p  of  the  Si,  this  number  is  (^). 
We  now  use  induction  on  t,  having  done  the  case  t  =  0.  After  t  —  1  iterations,  formula  (4.18) 
is  true  when  t  is  replaced  by  anything  smaller  in  it.  In  particular,  it  holds  with  t  replaced  by 
t-1. 

We  must  now  focus  on  the  inner  loop  of  the  algorithm.  What  does  it  do?  Since  never 
changes,  neither  does  N^.  Formula  (4.18)  gives  0  or  1  for  all  t  according  asp<movp  =  m 

(p  >  m  is  impossible).  This  is  the  correct  answer  for  both  Nm  and 

Back  to  the  action  of  the  inner  loop.  Again  we  can  prove  it  by  induction,  but  now  we  are 
going  from        down  to  Nq.  We  dealt  with       in  the  previous  paragraph.  If  the  inner  loop 

has  done  the  correct  thing  with  N*_f_i,  then  the  number  of  times  s  appears  in  the  new  version 
of  A^*  is  /i(p,  r,t—l)—  /i(p,  r+l,t).  There  are  various  cases  to  consider.  We'll  just  look  at  one, 
namely  (rXl])     ((^T^*)-  Using  Ct')  ^  (l)  +  L\),  we  have 

/p-(t-l)\_/    p-t     \   ^   /p-t+l\_/'   p-t   \   ^  fp-t\ 
\r-{t-l)J      \{r  +  l)-tj         \r-t+lj      \r  - 1  +  1 J  \r-tj' 

which  is  what  we  needed  to  prove.  We  leave  the  other  cases  in  (4.18)  to  you.  The  last  sentence 
in  the  exercise  follows  from  the  fact  that  all  the  numbers  we  calculate  are  nonnegative.  (This 
takes  care  of  the  problem  of  how  we  should  interpret  the  multiset  difference  A  —  B  ii  s  appears 
more  often  in  B  than  it  docs  in  A.) 

When  t  >  m,  the  only  time  the  binomial  coefficient  is  used  in  (4.18)  is  when  t  =  p  =  m 
and  it  then  has  the  value  (  "  ),  which  is  zero  unless  r  =  m,  when  it  is  1.  Thus,  for  t  >  m, 
fi{p,  r,  t)  equals  1  if  r  =  p  and  0  otherwise.  Hence  N*  is  a  set  containing  precisely  those  elements 
that  are  in  exactly  r  of  the  5,. 

(d)  This  is  implicit  in  the  proof  for  (c) . 

4.1.14  (a)   Si  is  the  set  of  permutations  of  n  that  fix  i  and  so 

\Si^r\...r\Si^\  =  {n-k)\. 

This  leads  to 

(b)  Choose  k  fixed  points  AND  derange  the  remaining  n  —  k  points. 

(c)  We  want  to  prove  that 

i:Vrcr)G':.)<»-'-''  =  0'"-'''£^' 

It  suffices  to  Observe  that  ('=+OU,)(n-fc-i)!  =  (^^^^^^^^^^^(n- fc-i)!  =  ^  and 

\k)      i\  k\i\- 
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4.1.15.  Let  Si  be  the  subset  of  n  divisible  by  Oj.  Then,  Nt  is  the  sum  over  all  t-subsets  T  of  ^  of 

[n/lcm(r)J,  where  lcm(r)  is  the  least  common  multiple  of  the  elements  of  T  and  the  floor  [xj  is 
the  largest  integer  not  exceeding  x.  For  the  various  parts  of  the  exercise  you  need  the  following. 

•  If  all  elements  of  A  divide  n,  then  [n/lcm(T)J  =  n/lcm(T). 

•  If  no  two  elements  of  A  have  a  common  factor,  then  lcm(T)  —  Yiier 

(a)    Using  the  previous  comments  in  the  special  case  A;  =  0,  we  obtain  after  some  algebra 


;,-im  =  „n(i-i) 


1=0  aeA 


which  is  the  Euler  phi  function  when  A  is  the  set  of  prime  divisors  of  n. 

(b)  The  comment  for  (a)  applies  in  this  case  as  well. 

(c)  There  is  no  simple  formula  even  when  k  =  0  because  the  floor  function  cannot  be  eliminated. 

(d)  Now  we  cannot  even  eliminate  the  1cm  function. 

4.1.16.  We  cannot  have  x  less  than  x,  which  is  required  by  (P-1). 

4.1.17.  In  all  cases,  what  we  must  do  is  prove  that  (P-1),  (P-2)  and  (P-3)  hold.  We  omit  most  of 
them. 

(d)  Since  x/x  =  1,  (P-1)  is  true.  Suppose  that  xpy  and  ypx.  Then  x/y  and  y/x  are  both  integers. 
Since  {x/y){y/x)  =  1,  the  only  possible  integer  values  for  x/y  and  y/x  are  ±1.  Since  x  and 
y  are  positive,  it  follows  that  x/y  —  1  and  so  (P-2)  is  true.  Suppose  that  x/y  and  y/z  are 
integers.  Then  so  is  x/z  and  so  (P-3)  is  true. 

4.1.18.  Since  xpx,  we  have  xtx  and  so  (P-1)  is  true  for  (5*,  r).  Suppose  xry  and  yrx.  Then  ypx  and 
xpy.  By  (P-2)  for  the  poset  {S, p),  x  =  y  and  so  (P-2)  is  true  for  {S,t).  Suppose  xry  and  yrz.  Then 
zpy  and  ypx  and  so  zpx.  Thus  xtz  and  so  (P-3)  is  true. 

4.1.19.  Since  every  set  is  the  union  of  itself,  xpx.  Suppose  xpy  and  ypx.  Let  by  be  a  block  of  y. 
Since  xpy,  Q  by  for  some  block  bx  of  x.  Since  ypx,  by  C  bx  for  some  block  by  of  y.  Since  blocks  of 
a  partition  are  either  equal  or  disjoint  and  since  by  C  bx  by,  we  have  by  =  by  and  so  bx  =  by.  This 
proves  that  every  block  of  ?/  is  a  block  of  x.  Hence  x  =  y  and  so  (P-2)  is  true.  It  is  easy  to  prove 
(P-3). 

4.1.20.  The  proofs  of  (P-1),  (P-2)  and  (P-3)  are  all  straightforward  uses  of  "and."  We  do  (P-3). 

{x,  x')Tr{y,  y')  and  {y,  y')Tr{z,  z')    means    (xpy  and  x'ry'^  and  (ypz  and  y'rz'^ ; 

which  is    (xpy  and  ypz^  and  (x'ry'  and  y'rz'^ ; 

which  implies    xpz  and  x'tz'  by  (P-3)  for  p  and  r; 

which  means    (a;,  x')it{z,  z'). 

4.1.21  (a)  With  each  element  s  £  S,  associate  a  set  g{s)  such  that  s  £  Si  ii  and  only  if  i  G  g{s). 
Then  Ek  counts  those  s  e  S'  for  which  \g{s)\  =  k.  Since  the  number  of  s  G  S  with  g{s)  =  y  is 
e{y),  the  sum  of  e{y)  over  \y\  =  k  also  counts  those  s. 


(b)    An  element  s  is  counted  in  (4.14)  if  and  only  if  it  belongs  to  all  Si  for  which  i  €  x.  This  is  the 
same  as  the  deflnition  of  the  set  intersection. 
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(c)    The  sum  of  e{x)  over  all  x  of  since  k  is  Eh.  Putting  this  together  with  (4.15),  we  have 

\x\=ky^x  \y\>k   xCy  \y\>k  ^  ^ 

\x\=k 

The  sum  of  /(y)  in  (b)  over  all  y  of  size  t  is  A^j.  Collecting  terms  according  to  |y|,  we  have 

t=k  ^  '  j=0   ^  ' 

where  we  set  t  =  «  +  fc.  Now  use  = 


Section  4.2 

4.2.1.  The  number  of  6-long  sequences  made  with  B,  R  and  W  is  3^  =  729,  which  is  much  too 
long.  The  number  of  6-long  sequences  in  which  adjacent  beads  differ  in  color  is  3  x  2^  =  96,  which 
is  more  manageable,  but  still  quite  long.  We  won't  list  them.  We  could  "cheat"  by  being  a  bit  less 
mechanical:  If  the  necklace  contains  a  B,  wc  could  start  with  it.  There  are  2^  =  32  such  necklaces, 
a  manageable  number.  The  only  necklace  without  B  must  alternate  R  and  W,  so  there  is  only  one 
of  them.  Here  are  the  32  other  necklaces,  where  a  number  preceding  a  necklace  is  the  first  place  it 
appears  in  the  list  when  considered  circularly  or  flipped  over.  A  zero  means  it  was  rejected  because 
the  first  and  last  beads  arc  the  same. 


1:  BRBRBR 
0:  BRBWRB 
3:  BRWRBR 
0:  BWBRWB 
0:  BWRBRB 
0:  BWRWRB 


2:  BRBRBW 
5:  BRBWRW 
8:  BRWRBW 
8:  BWBRWR 
7:  BWRBRW 
12:  BWRWRW 


0:  BRBRWB 
0:  BRWBRB 
0:  BRWRWB 
4:  BWBWBR 
0:  BWRBWB 


3:  BRBRWR 
6:  BRWBRW 
9:  BRWRWR 
10:  BWBWBW 
7:  BWRBWR 


2:  BRBWBR 
0:  BRWBWB 
2:  BWBRBR 
0:  BWBWRB 
5:  BWRWBR 


4:  BRBWBW 

7:  BRWBWR 

4:  BWBRBW 

11:  BWBWRW 

11:  BWRWBW 


4.2.2.  There  are  8  solutions  with  equal  numbers  of  B's  and  W's,  5  with  5  B's,  4  with  6  B's  and  one 
each  with  7  and  8  B's.  this  gives  us  a  total  of 

8  +  2(5  +  4+1  +  1)  =  30. 

4.2.3  (a)  Since  4  beads  are  used,  at  most  4  different  kinds  of  beads  arc  used.  We  can  construct 
an  arrangement  of  beads  by  choosing  the  number  of  types  that  must  appear  (1,  2,  3  OR  4), 
choosing  that  many  types  of  beads  from  the  r  types  AND  then  choosing  an  arrangement  using 
all  of  the  types  of  beads  that  we  chose. 

(b)   Trivially,  /(I)  =  1.  For  /(2),  our  decision  will  be  the  number  of  beads  of  the  first  type  that 

appear.  After  that,  it  is  easy.  This  gives  us  1  +  2  +  1  =  4.  For  /(3),  our  decision  will  be  which 
bead  appears  twice.  This  gives  us  3  x  2  =  6  For  /(4),  each  bead  appears  once  and  there  are 
3  possibilities.  Thus 


which  can  be  rewritten  as  r(r  +  l)(r^  +  r^)/8,  if  desired. 
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4.2.4.  This  has  nothing  at  all  to  do  with  symmetries.  If  there  are  F{r)  ways  to  use  r  types  of 
"beads"  to  do  something  and  /(r)  ways  to  do  it  so  that  each  bead  is  used  at  least  once,  then 

are  used,  then  the  upper  limit  on  the  sum  can  be  replaced 

fe=o  ^  ^ 

by  M  because  (i)  f{k)  =  0  for  fc  >  M  and  (ii)  (p  —  0  for  any  j  satisfying  r  <  j  <  M. 

4.2.5.  The  problem  can  be  solved  by  either  decision  tree  method.  It  is  useful  to  note  that  all 

solutions  must  begin  with  h  because  any  board  that  starts  with  v  can  be  flipped  about  a  NW-SE 
(135°)  diagonal  to  give  one  that  starts  with  h.  Also  note  that  a  lexically  least  sequence  that  starts 
with  hv  determines  the  entire  sequence.  (To  see  this,  note  that  it  starts  hvv  and  look  at  rotations 
of  the  board.) 

We  will  use  the  second  method.  Our  first  decision  will  be  the  number  of  entire  rows  and/or 
columns  that  are  covered  by  two  whole  dominoes.  For  example,  two  dominoes  in  the  top  row  or 
two  dominoes  in  the  third  cohimn.  Note  that  we  cannot  simultaneously  cover  a  row  and  a  column 
because  they  overlap.  Let  the  number  be  L.  The  possible  values  of  L  are  0,  1,  2  and  4.  (You  should 
find  it  easy  to  see  why  L  =  3  is  impossible.)  Note  that  we  can  always  use  the  symmetries  to  make 
the  first  domino  horizontal.  For  L  =  4,  there  is  obviously  only  one  solution  and  its  lex  minimal  form 
is  hhhhhhhh.  For  L  =  0,  we  use  Method  1  to  obtain  hvvhvvhh  as  the  the  only  solution.  (Beware: 
reading  the  sequence  in  reverse  does  not  correspond  to  a  symmetry  of  the  board.)  For  L=  1,  we  note 
that  the  entire  row  or  column  must  be  at  the  edge  of  the  board.  Suppose  it  is  the  first  row.  Refer 
back  to  Figure  3.15  to  see  that  the  only  way  to  complete  the  board  without  increasing  L  is  hvvvvh. 
This  is  already  lex  minimal:  hhhvvvvh.  Suppose  L  =  2.  By  rotation,  we  can  assume  we  have  two 
full  rows  and,  because  they  cannot  be  in  the  middle,  one  of  them  is  the  first  row.  Again,  refer  to 
Figure  3.15  to  find  how  many  ways  we  can  complete  the  board  with  one  more  horizontal  row.  This 
leads  to  six  solutions:  hhhhhvvh,  hhhvvhhh,  hhhhvhvh,  hhvhvhhh,  hhhhvvvv  and  hhvvvvhh.  This 
gives  a  total  of  nine  solutions. 

4.2.6.  There  are  4  solutions: 

mm 

4.2.7.  When  we  write  out  our  answers,  they  will  be  in  the  form  suggested  in  the  problem,  without 
the  surrounding  boxes.  To  obtain  the  lex  least  solutions,  we  must  linearly  order  the  faces.  Our  order 

will  be  the  line  of  four  side  faces  from  left  to  right,  then  the  top  and,  finally,  the  bottom.  We  use  B, 
R  and  W  to  denote  the  colors,  and  b,  r  and  w  to  denote  the  number  of  faces  of  each  color. 

(a)  Our  first  decision  will  be  the  number  of  black  faces.  By  interchanging  black  and  white,  a 
solution  with  b  black  faces  can  be  converted  to  one  with  6  —  6,  so  we  only  need  look  at  6  =  0  1, 
2  and  3.  For  6=0  and  b  =  1,  there  are  obviously  only  one  solution.  For  6  =  2,  we  must  decide 
whether  to  put  the  second  black  face  adjacent  or  opposite  the  first  one.  Here  are  the  4  solutions 
for  6  <  3. 

w         w         w  w 

WWWW    BWWW    BBWW  BWBW 
W  W  W  W 

For  6  =  3,  our  second  decision  is  whether  or  not  all  three  black  faces  share  a  common  vertex. 
This  leads  to  just  2  solutions: 


B  W 
BBWW  BBBW 
W  W 
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W  W 
1,1,4    BR  WW     BWR  W 
W  W 


W  R  W 

1,2,3    BRRW     BR  WW  BRWR 
WWW 


R 

BB  WW 
R 


R 

B  R  B  W 
W 


w 

BR  B  R 
W 


2,2,2 


W 
BB  R  R 
W 


R 

BB  RW 
W 


W 

BB  RW 
R 


Figure  S.4.1    The  distinct  painted  cubes  with  various  numbers  of  faces  painted  Black,  Red  and  White. 


Doubling  the  answers  for  6  <  3  to  get  those  for  6  >  3  gives  us  10  solutions. 

(b)  In  the  previous  solution,  we  can  limit  ourselves  to  6  <  3.  When  6  =  3,  we  need  to  check  whether 
or  not  one  solution  is  converted  to  the  other  when  black  and  white  are  interchanged.  They  are 
not,  so  6  =  3  still  gives  2  solutions  for  a  total  of  6. 

(c)  The  mirror  image  of  each  of  the  10  solutions  is  equivalent  to  itself,  so  there  are  still  10  solutions. 

(d)  Our  first  decision  will  be  the  list  b,  r,  w.  By  interchanging  colors,  we  need  only  consider  the 
situations  where  b  <  r  <  w.  This  gives  us  1,1,4,  1,2,3  and  2,2,2.  Interchanging  colors  in  all 
possible  ways  gives  rise  to  3,  6  and  1  solutions,  respectively,  for  each  solution  found.  For  1,1,4, 
our  decision  will  be  whether  B  and  R  are  on  adjacent  or  opposite  faces.  Each  leads  to  one 
coloring.  For  1,2,3  our  first  decision  will  be  the  number  of  R's  that  are  adjacent  to  the  B. 
One  adjacency  gives  1  solution  and  two  give  2  solutions,  depending  on  whether  the  R's  are 
adjacent  or  opposite  each  other.  For  2,2,2,  our  first  decision  will  be  whether  or  not  the  B's 
are  adjacent  or  opposite.  Our  second  decision  will  be  whether  or  not  the  R's  are  adjacent  or 
opposite.  Each  choice  leads  to  1  solution  except  when  the  B's  are  adjacent  and  the  R's  are 
adjacent.  In  this  case  there  are  more  solutions.  One  possibility  is  to  have  the  4  sides  be  BBRR. 
Another  possiblility  is  to  have  the  4  sides  be  BBRW  and  then  place  the  additional  R  on  cither 
the  top  or  the  bottom.  These  last  two  possibilities  are  mirror  images  of  each  other,  but  we 
cannot  transform  one  to  the  other  with  just  rotations.  The  solutions  are  given  in  Figure  S.4.1. 
This  gives  us  2x3  +  3x6  +  6  =  30  solutions. 

(e)  If  all  3  colors  appear,  there  arc  30  solutions.  If  only  1  color  appears,  there  are  obviously 
3  solutions.  What  if  exactly  2  colors  appear,  we  can  first  choose  the  2  colors  AND  then  use 
them.  By  the  first  part  of  this  exercise,  there  are  10  —  2  =  8  ways  to  use  the  colors  so  that  both 
appear.  Thus  we  have  30  +  3  +  (2)8  =  57  solutions. 

(f)  Note  that  no  color  can  appear  more  than  3  times  on  any  given  cube.  Also  note  that  at  most 
6  colors  appear  on  any  given  cube.  By  looking  over  our  previous  work,  we  find,  in  the  notation 
of  Exercise  4.2.4,  that  /(O)  =  /(I)  =  0,  /(2)  =  1  and  /(3)  =  8.  By  looking  at  decision  trees  for 
the  color  counts  1,1,1,3  and  1,1,2,2,  we  find  that  /(4)  =  (i)2+  (2)5  =  32.  Consider  /(5)  which 
has  just  the  one  color  count  list  1,1,1,1,2.  There  is  one  way  to  place  the  repeated  colors.  The 
partially  colored  cube  can  be  transformed  into  itself  be  leaving  it  fixed  or  by  rotating  it  so  that 
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z 

0122233344444 

D 

12323323333 

A 

2  0  12  3  3  3  3 

000 

1000000000000 

001 

1101101000000 

010 

1111101101000 

100 

1111111111110 

oil 

1110110101111 

101 

1111110  1110  11 

110 

1111111010111 

111 

1111011110101 

Figure  S.4.2  The  13  different  Boolean  functions  for  n  =  3.  Each  column  gives  a  function  having  the  values 
of  z,  D  and  A  at  the  head  of  the  column.  The  row  label  describes  the  argument;  e.g.,  an  entry  in  row  101  is 
/(1,0, 1). 


the  two  colored  faces  are  interchanged.  This  means  that  whenever  we  color  the  remaining  4  faces 
with  4  dist/inct  colors,  there  will  be  exactly  one  other  coloring  that  is  equivalent  to  it.  Thus 
/(5)  =  ('^)(4!/2)  =  60.  If  you  experiment  a  bit,  you  will  discover  that  there  are  24  symmetries 
of  the  cube.  If  all  the  faces  are  colored  differently,  each  of  the  symmetries  leads  to  an  equivalent 
coloring  that  looks  different.  Thus  /(6)  =  6!/24  =  30.  Putting  all  this  together,  we  have 


4.2.8  (a)  We  can  describe  the  vertices  of  a  square  or  cube  by  specifying  their  coordinates.  For  the 
square  in  the  exercise,  the  coordinates  are  (0,0),  (0, 1),  (1, 0)  and  (1, 1).  We  can  interpret  a  the 
digit  d  at  the  corner  with  coordinates  {x,y)  as  saying  that  f{x,y)  =  d.lna,  similar  manner,  a 
cube  corresponds  to  n  =  3. 

(b)  We'll  just  do  the  cube.  Permutions  of  the  arguments  correspond  to  symmetries  that  do  not 
move  the  point  (0,  0,  0).  Replacing  Xi  with  Xj®  1  corresponds  to  reflection  in  the  plane  Xi  =  1/2. 
Thus  everything  except  the  c®  part  is  explained  in  terms  of  rotations  and  reflections  of  the 
cube.  Conversely,  given  any  symmetry  of  the  cube,  it  can  be  interpreted  in  this  manner.  For 
example,  if  a:;-axis  is  mapped  so  that  it  is  parallel  to  the  2;-axis,  then  (t(3)  =  1.  The  image  of 
(0,0,0)  is  (di, ^2, c^s)-  Finally,  we  note  that  c  =  1  corresponds  to  interchanging  the  values  of 
zero  and  one  assigned  to  the  vertices  of  the  cube. 

(c)  Our  first  decision  the  value  for  z,  the  number  of  corners  of  the  cube  with  zeroes.  Our  second 
decision,  when  neeeded  was  the  dimension  D  of  the  part  of  the  cube  that  contained  all  the 
zeroes;  e.g.,  if  they  were  all  on  one  face,  D  =  2.  Given  any  vertex  v  with  a  zero,  we  can  ask 
how  many  zeroes  we  can  reach  from  v  by  going  to  the  other  end  of  an  edge  containing  v. 
(This  number  is  0,  1,  2  or  3.)  If  a  third  decision  was  needed,  it  was  A.  the  maximum  of  this 
number  over  all  such  v.  We  then  used  Method  1.  A  table  of  the  resulting  13  functions  is  given 
in  Figure  S.4.2. 
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Section  4.3 

4.3.1.  The  image  of  F  is  all  k  element  subsets  of  n.  F~^{x)  consists  of  all  possible  ways  to  arrange  the 

elements  of  a;  in  a  list.  Since  we  are  able  to  count  lists,  we  know  that  there  are  fc!  such  arrangements. 
We  also  know  that  |^|  =  n!/(n  —  A:)!.  Thus  the  coimage  of  F  consists  of  C{n,  k)  blocks  all  of  size  fc! 
and  the  union  of  these  blocks  has  n!/(n  —  fc)!  elements.  Thus  C(n,  fc)  =  ^i^^^i^^, . 

4.3.2.  This  is  a  problem  for  Chapter  1  since  it  deals  with  circular  sequences  of  distinct  things!  An 

n-long  circular  sequence  of  distinct  can  be  cut  in  n  places  to  get  n  different  n-long  lists  and  each 
list  is  obtained  exactly  once  this  way.  Thus  the  answer  is  k(k  —  1)  •  •  •  (fc  —  n  +  l)/n. 

4.3.3.  Note  that  N{j)  =  0  unless  7  G  Pg  or  7  G  P5.  In  the  former  case,  iV(7)  =  (l)  =  56  and  in  the 
latter  case,  iV(7)  —  (^)  (■^)  =  6.  Thus  there  are  (56  +  4  x  6)/16  =  5  necklaces. 

4.3.4.  For  7  G  Pg,  7V(7)  =  fc^.  For  7  G  P5,  N{-f)  =  k^.  For  7  G  P4,  N{j)  =  k^.  For  7  G  P2, 
N{'y)  =  k^.  For  7  G  Pi,  A''(7)  =  fc.  Thus  the  answer  is 

^  (fc^  +  4fc5  +  5fc*  +  2fc2  +  4fc) . 

4.3.5  (a)  The  second  line  consists  of  the  first  line  circularly  shifted  by  c,  an  integer  between  0  and 
n  —  1;  i.e.,  the  second  line  is  si,  S2,  •  •  • ,  s„,  where  St  =  c  +  t  if  this  is  at  most  n  and  c  +  t  —  n, 
otherwise. 

(b)  In  addition  to  the  elements  of  the  cyclic  group,  we  have  permutations  whose  second  lines  are 

cyclic  shifts  of  n, . . . ,  2, 1. 

(c)  There  are  0,  1  or  2  cycles  of  length  1  and  the  remaining  cycles  are  all  of  length  2.  If  n  is 
odd,  there  is  always  exactly  one  cycle  of  length  1.  If  n  is  even,  there  is  never  exactly  one  cycle 
of  length  1.  You  can  write  down  the  cycles  as  follows.  All  numbers  that  are  mentioned  are 
understood  to  have  an  appropriate  multiple  of  n  added  to  (or  subtracted  from)  them  so  that 
they  lie  between  1  and  n  inclusive.  If  n  is  odd,  choose  a  cycle  (fc).  The  remaining  cycles  are 
{k  —  t,  k  +  t)  where  1  <  t  <  n/2.  If  n  is  even,  choose  fc  <  n/2.  There  are  two  ways  to  proceed. 
First,  we  could  have  all  cycles  of  the  form  {k  —  t  +  l,k  +  t)  where  I  <  t  <  n/2.  Second,  we 
could  have  (fc),  (fc  +  n/2)  and  all  cycles  of  the  form  {k  —  t,k  + 1)  where  1  <  t  <  n/2. 

4.3.6  (a)   Number  the  squares  1  to  16,  starting  in  the  upper  left  corner  and  proceeding  left  to  right 

one  row  at  a  time.  There  are  just  four  permutations  of  the  board,  namely  the  cyclic  group  on 
4  things.  Here's  what  the  permutations  other  than  e  do  the  the  squares  of  the  board: 

(1, 4, 16, 13)(2, 8, 15, 9)(3, 12, 14, 5)(6, 7, 11, 10) 

(1, 13, 16, 4)  (2, 9, 15, 8)(3, 5, 14, 12)(6, 10, 11, 7) 
(1, 16)(4, 13)(2, 15)(8,9)(3, 14)(5, 12)(6, 11)(7, 10). 
Thus  the  number  of  ways  to  choose  8  squares  is 

i(C8^)+2Q+a))- 

(b)   We  now  have  the  dihedral  group.  The  4  additional  permutations  are 

(1)(2,5)(3,9)(4, 13)(6)(7, 10)(8, 14)(11)(12, 15)(16) 

(1, 16)(2, 12)(3, 8)(4)(5, 15)(6, 11)(7)(9, 14)(10)(13) 
(1,4)(2,3)(5,8)(6,7)(9,12)(10,11)(13,16)(14,15) 
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(1, 13)(2, 14)(3, 15)(4, 16)(5, 9)(6, 10)(7, 11)(8, 12). 
Thus  the  number  of  ways  to  choose  8  squares  is 

1{0  +  Ht)  +  ilMDQ  +  it)Q  +  (o)(!)]  +  2©). 

4.3.7.  The  proof  in  the  text  shows  that  the  right  side  of  the  given  equality  is  |G|  '^g^Q  N{g).  By 
(4.20),  the  left  side  is 

yes  yes  I 

The  rest  of  the  proof  follows  easily  by  adapting  what  was  done  in  the  text.  This  seems  to  be  a 
shorter  proof  than  the  one  in  the  text.  Why  didn't  we  use  it?  First,  it's  not  particularly  shorter; 
however,  it  is  a  bit  cleaner.  Unfortunately,  it  requires  starting  with  the  completely  unmotivated 
double  summation  in  which  we  have  interchanged  the  order  of  the  sums. 

4.3.8  (a)   We  don't  list  the  coverings.  In  general,  the  coverings  can  be  made  by  stringing  together 
two  kinds  of  "beads:"  a  single  vertical  domino  and  a  pair  of  horizontal  dominoes,  one  above 

the  other. 

(b)  In  the  previous  result,  replace  the  vertical  domino  beads  by  ones  and  the  horizontal  pairs  by 
twos.  Since  a  vertical  covers  one  column  and  a  pair  of  horizontals  covers  two,  the  numbers  add 
up  to  n. 

(c)  A  sequence  of  ones  and  twos  completely  determines  a  board.  Symmetries  of  the  board  either 
leave  a  sequence  unchanged  or  reverse  its  order. 

(d)  The  group  of  symmetries  of  the  sequences  of  ones  and  twos  contains  just  two  elements:  the 
identity  e  and  the  reversal  of  left  and  right,  say  g.  Obviously,  N{e)  =  D{n).  N{g)  is  associated 
with  the  number  of  ways  to  cover  a  board  of  roughly  length  n/2.  If  n  is  odd,  g  leaves  the  middle 
column  of  the  board  fixed  and  interchanges  columns  j  and  n  +  1  —  j  for  1  <  j  <  n/2.  If  the 
board  is  to  be  unchanged  by  g,  the  middle  column  must  contain  a  vertical  domino,  the  first 
(n  —  l)/2  columns  can  contain  any  covering,  and  the  last  (n  —  l)/2  must  contain  their  image 
under  g.  Thus  N{g)  =  D{{n  —  l)/2)  when  n  is  odd.  When  n  is  even,  a  similar  argument  can 
be  used.  Now  there  is  no  center  column.  We  can  either  cover  the  first  n/2  columns  or  cover 
the  first  n/2  —  1  columns  and  place  a  pair  of  horizontal  dominoes  covering  columns  n/2  and 
n/2  +  1. 

Section  5.1 

5.1.1.  The  sum  is  the  number  of  ends  of  edges  since,  if  x  and  y  are  the  ends  of  an  edge,  the  edge 
contributes  1  to  the  value  of  d{x)  and  1  to  the  value  of  d{y).  Since  each  edge  has  two  ends,  the  sum 
is  twice  the  number  of  edges. 

5.1.2.  To  specify  a  graph  we  must  choose  E  E  V2{V).  Let  N  =  \P2{V)\.  Then  there  are  2''^  possible 
subsets  E  of  V2{V)  and  (^)  of  them  have  cardinality  q.  Since  17^2 (^)|  =  (2)1  '^^  are  done  with 
(a)  and  (b).  Since  there  are  2^  graphs,  each  has  probability         and  so  the  probability  in  (c)  is 

2-"(:)- 

5.1.3.  The  graph  with 


(a  b  c  d  e  f  g  h  i  j  k\ 
ccfaheeadaa] 
cgghhhfhgdfJ 


Solutions  Manual  31 


is  isomorphic  to  Q.  The  correspondence  between  vertices  is  given  by 

fABCDEFGH\ 

\hacefdgb) 

where  the  top  row  corresponds  to  the  vertices  of  Q.  The  graph  with 

/I     23456     789    10    11  \ 
E  =  {1,2,3,4,5,6,7,8,9,10,11}    and    ip=\AEEEFGHBCD  e\. 

\ghefghbcddhJ 

is  not  ismorphic  to  Q.  One  edge  needs  to  be  deleted  from  P'{Q)  and  one  added. 

5.1.4.  If  a  pictorial  representation  of  R  can  be  created  by  labeling  P'{Q)  with  the  edges  and  vertices 
of  R,  then  R  has  degree  sequence  (0,  2,  2, 3, 4, 4, 4,  5).  The  converse  is  false  (find  a  counterexample). 

5.1.5  (a)    There  is  no  graph  Q  with  degree  sequence  (1, 1, 2, 3, 3,  5)  since  the  sum  of  the  degrees  is 
odd. 

(b)  There  are  such  a  graph.  You  should  draw  an  example. 

(c)  Up  to  labeling,  the  graph  is  unique.  Take  V  =  {1, . . . ,  6}  and 

E  =  {{1,6},  {2,6},  {2,4},  {3,6},  {3,5},  {4,6},  {4,5},  {5,6}} 

(d)  A  graph  with  degree  sequence  (3,3,3,3)  has  (3  +  3  +  3  +  3)/2  =  6  edges  and,  of  course  4 

vertices.  That  is  the  maximum  (g)  of  edges  that  a  graph  with  4  vertices  can  have.  It  is  easy  to 
construct  such  a  graph.  This  graph  is  called  the  complete  graph  on  4  vertices. 

(f)  There  is  no  simple  graph  (or  graph  without  loops  or  parallel  edges)  with  degree  sequence 
(3,3,3,5). 

(g)  Similar  arguments  to  the  (3,3,3,3)  case  apply  to  the  complete  graph  with  degree  sequence 
(4,4,4,4,4). 

Section  5.2 

5.2.1.  Let     and  e  be  the  bijections. 

(a)  This  follows  from  the  fact  that  v  and  e  are  bijections. 

(b)  This  can  be  seen  intuitively  from  the  drawing  of  the  unlabeled  graph.  If  you  want  a  more  formal 
proof,  first  note  that  the  degree  of  a  vertex  ('  is  the  number  of  edges  e  such  that  v  €  v(e).  Now 
use  the  fact  that  v  E  (f{e)  is  equivalent  to  i'{v)  €  (f'{e{e)). 

5.2.2.  Each  of  (a)  and  (c)  has  just  one  pair  of  edges  with  the  same  endpoints,  while  (b)  and  (d) 
each  have  two  pairs.  Thus  neither  (b)  nor  (d)  is  equivalent  to  (a)  or  (c).  Vertex  1  of  (b)  has  degree 
4,  but  (d)  has  no  vertices  of  degree  4.  Thus  (b)  and  (d)  are  not  equivalent.  It  turns  out  that  (a)  and 

(c)  are  equivalent.  We  leave  it  to  you  to  find  v  and  e. 

5.2.3  (a)    This  is  exactly  like  the  next  problem  with  the  transpose,  *,  replaced  by  inverse, 
everywhere. 

(b)  Let  I  be  the  n  x  n  identity  matrix.  Since  A  =  lAP,  A  ~  A.  Suppose  that  A  B.  Then 
B  =  PAP*  for  some  nonsingular  P.  Multiplying  on  the  left  by  P~^  and  on  the  right  by 

(p-i)*  =  (P*)-i,  we  have 

{p-^)B{p-y  =  {p-^P)A{P'{p-y)  =  {P-^P)A{P'{P')-^)^  A. 

Thus  B  A.  Suppose  that  A  ~  B  ~  C.  Then  we  have  nonsingular  P  and  Q  such  that 
B  =  PAP*  and  C  =  QBQ*.  Thus  C  =  Q{PAP*)Q*  =  {QP)A{P*Q*)  =  {QP)A{QPy.  This 
proves  transitivity. 
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5.2.4.  We  will  just  explain  those  that  are  not  equivalence  relations. 

(c)  We  could  have  three  students,  Alice,  Bill  and  Chris,  with  Chris  having  a  class  with  Alice  and 
another  class  with  Bill,  but  Alice  and  Bill  have  no  classes  in  common.  This  would  make  Chris 
equivalent  to  both  Alice  and  Bill,  but  Alice  and  Bill  would  not  be  equivalent.  This  violates  the 
transitive  property  of  equivalence  relations. 

(d)  Consider  the  numbers  0,  0.0008,  0.0016.  As  in  the  previous  question,  the  transitive  property 
fails. 

5.2.5.  Let  E  e  V2{V)  and  E'  e  V2{V').  Write  G  =  (K  E)  ~  [V ,  E')  =  G'  if  and  only  if  there  is  a 
bijcction  v.V  ^  V  such  that  {u,  w}  e  _E  if  and  only  if  {v{u),  ^{f^)}  ^ 

Wc  could  show  that  this  is  an  equivalence  relation  by  adapting  the  proof  in  Example  5.5. 
An  alternative  is  to  show  how  this  definition  leads  to  the  equivalence  relation  for  G  and  G"  in- 
terpreted as  graphs.  We'll  take  this  approach.  In  this  case  and  ip'  are  identity  maps.  Define 
e({u, w})  =  {v{u),i'{v)}.  By  our  definition  in  the  previous  paragraph,  e:E  ^  E'  is  &  bijection. 
Since  ip  and  ip'  are  the  identity,  the  requirement  that  ip'{e{e))  =  v{ip{e))  in  the  definition  of  graph 
isomorphism  is  satisfied. 

5.2.6  (a)  With  tt  the  identity,  wc  see  that  f  —  f  ■  Suppose  that  f{x)  =  g{Tr{x))  for  all  x  €  A  and 
let  y  £  A.  Since  tt  is  a  bijection,  tt{x)  =  y  for  some  x  £  A.  Thus  x  =  TT~^{y)  and 

g{y)  =  g{TT{x))  =  f{x)  =  f{n-\y)). 

Thus  the  permutation  Tr~^  proves  that  g  ~  /.  (For  those  more  familiar  with  manipulating  func- 
tions, we  could  simply  say:  Since  f  =  gn  and  tt  is  a  bijection,  we  have  g  =  g7rjT~^  =  /tt"^.) 
Suppose  that  f  c:i  g  c^i  h  and  that  the  permutations  involved  are  tt  and  a.  Then 
f{x)  =  g{n{x))  =  h{a{'jT{x)).  Since  cr(7r(a;))  is  a  permutation  of  ^4,  /  ~  h. 

(b)  Call  two  functions  f,g:A^B  equivalent  if  there  is  permutation  tt  of  B  such  that  /  =  irg. 
With  TT  the  identity  function,  we  see  that  f  —  f.  Suppose  that  /  =  irg,  then  g  =  'K~^f  and  so 
g  —  f.  Suppose  that  f  =  Trg  and  g  =  ah.  Then  /  =  {na)h  and  so  /  ~  /i. 

(c)  Suppose  f,g:A^B  and  that  there  are  permutations  a  and  (3  of  A  and  B  respectively  such  that 

/  =  l3ga.  Then  call  /  and  g  equivalent.  Using  the  identity  permutations,  we  have  that  f  f . 
Since  g  =  p-'^fa-\  g  :^  h.  Suppose  that  g  =  /3'ha'.  Then  /  =  P{P'ha')a  =  {/3/3')h{a'a),  and 
so  /  ~  /i. 

5.2.7.  The  table  is  shown  in  Figure  S.5.1.  The  entries  which  are  1  follow  when  you  realize  what  is 
being  counted.  The  LL  row  corresponds  to  ordered  samples  and  the  UL  row  to  unordered  samples, 
which  have  been  considered  in  Chapter  1.  The  UL-surjection  entry  comes  from  the  realization  that 
our  sample  allows  repetition  but  must  include  every  element  in  b  so  that  we  are  only  free  to  choose 
a  —  b  additional  elements.  In  the  LU  row.  the  fact  that  the  range  is  unlabeled  means  that  we  can 
only  distinguish  functions  that  have  different  coimages.  The  UU  row  is  associated  with  partitions  of 
numbers.  We  use  p{n,  k)  to  denote  the  number  of  partitions  of  n  having  exactly  k  parts. 
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A  B 


all 


injections 


surjcctions 


L  L 

L  U 

U  L 

U  U 


6" 

Efe<b'S'(a,fe) 
a  +  b-1^ 
a 

Efe<bP(a.fc) 


b{b-l)---{b-a+l)  b\S{a,b) 

1  ^Ca,  b) 

'b\  fa-l 

J  ya  ~  b 

1  p{a,b) 


Figure  S.5.1    Some  basic  enumeration  problems. 


Section  5.3 

5.3.1.  Since  E  C  P2{V),  we  have  a  simple  graph.  Regardless  of  whether  you  are  in  set  C  or  S, 
following  an  edge  takes  you  into  the  other  set.  Thus,  following  a  path  with  an  odd  number  of  edges 

takes  you  to  the  opposite  set  from  where  you  started  while  a  path  with  an  even  number  of  edges 
takes  you  back  to  your  starting  set.  Since  a  cycle  returns  to  its  starting  vertex,  it  obviously  returns 
to  its  starting  set. 

5.3.2  (a)  and  (b)  The  solution  to  Exercise  5.3.1  shows  that  the  graph  is  bipartite  and  the  argument 
given  there  for  even  length  cycles  can  be  used  practically  as  is  for  bipartite  graphs:  just  replace 

C  and  V  with  A  and  B. 

(c)  Start  sets  A  and  B  empty.  Choose  any  vertex  vq  and  place  it  in  the  set  A.  While  there  is  some 
edge  {u,  v}  with  u  G  Au  B  and  v  ^  Au  B,  place  v  in  the  set  not  containing  u. 

Since  the  graph  is  connected,  all  vertices  will  eventually  be  in  ^  or  _B,  so  we  produce  a 
partition  of  V.  Suppose  that  there  is  some  edge  e  =  {x,  y}  with  both  ends  in  A.  By  the  way 
A  and  B  were  constructed,  there  is  a  path  from  x  to  vq  consisting  solely  of  edges  that  were 
used  to  add  vertices  to  A  and  B.  If  the  vertices  on  the  path  are  cq  =  x,  ci, . . . ,  c„  =  vq,  then 
C2i  €  A  and  C2j+i  €  B.  There  is  a  similar  path  do  =  y,di, . . . ,  dm  =  vo-  Let  Ci  =  dj  be  the  first 
common  vertex  on  the  two  paths.  It  follows  that  x,  cq,  ci, . . . ,  Cj  =  dj,  dj-i, . . .  ,dQ  =  y,x  is  a. 
cycle  of  length  i  +  j  +  I.  Since  Ci  and  dj  are  the  same  vertex,  they  are  both  in  A  or  both  in 
B.  Thus  i  and  j  are  both  even  or  both  odd.  Consequently  i  +  j  +  1  is  odd,  contradicting  the 
requirement  that  all  cycle  length  be  even.  If  the  ends  of  e  are  in  B  a  similar  proof  works. 

(d)  Whenever  the  previous  algorithm  stops  and  vertices  remain,  choose  a  remaining  vertex  and 
place  it  in  A.  Then  continue  with  finding  and  edge  e. 

(e)  The  set  in  which  the  vertex  y  of  edge  e  is  placed  by  the  algorithm  is  forced.  Wc  have  a  free 
choice  when  we  arbitrarily  select  a  vertex  and  place  it  in  A.  Thereafter,  the  algorithm  places 
all  the  other  vertices  that  lie  in  the  same  component  and,  as  just  noted,  these  placements  into 
^  or  i?  are  forced.  In  other  words,  we  get  one  free  choice  of  A  or  i?  for  each  component. 

(f)  We  have  already  shown  that  all  cycles  in  a  bipartite  graph  have  even  length,  so  we  must  do 
the  reverse.  Let  G  be  an  arbitrary  graph  with  no  odd  length  cycles.  Apply  the  algorithm  for 
partitioning  V  into  A  and  B.  Now  look  back  at  the  proof  that  the  algorithm  worked.  We 
showed  that  if  there  was  an  edge  with  both  ends  in  A  or  both  in  B  then  there  was  a  cycle 
of  odd  length.  Since  G  has  no  cycles  of  odd  length,  it  follows  that  our  algorithm  provides  the 
partitioning  V  =  AU  B  required  in  the  definition  of  a  bipartite  graph. 


5.3.3  (a)    Let  e  =  {u,  v}  and  let  /  =  {v,  w}  be  the  other  edge.  Since  G  is  simple,        w.  Since  e  is 
a  cut  edge,  u  and  v  are  in  separate  components  of  {V,E—  {e}).  Thus  so  are  u  and  w.  Since  the 
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graph  induced  hy  V  —  {v}  is  a  subgraph  of  {V,  E  —  {e}),  u  and  w  are  in  separate  components 
of  it  as  well. 

(b)  Take  two  triangles  and  identify  their  tops.  The  merged  top  is  a  cut  vertex  but  the  graph  has 
no  isthmus. 

(c)  Wc  will  prove  that  e  is  a  cut  edge  if  and  only  if  its  ends  u  and  w,  say,  lie  in  different  components 
of  G'  =  {y,E  —  {e}).  The  result  will  then  follow  because,  first,  if  C  is  a  cycle  containing  e, 
removal  of  e  does  not  leave  its  ends  in  different  components,  and,  second,  if  u  and  v  are  in  the 
same  components  of  G' ,  then  there  is  a  path  P  connecting  them  in  G'  and  P  and  e  form  a 
cycle  in  G. 

Now  back  to  the  original  claim.  If  u  and  v  are  in  different  components  of  G',  then  e  is  a 
cut  edge.  Suppose  e  is  a  cut  edge  of  G.  Since  G  is  connected  and  every  path  in  G  that  is  not 
a  path  in  G'  contains  e,  it  follows  that  if  x  and  y  are  in  different  components  of  G'  any  path 
connecting  them  in  G  contains  e.  Let  P  be  such  a  path  and  let  u  be  the  end  of  e  first  reached 
on  P  when  starting  from  x.  It  follows  that  x  and  u  are  in  one  component  of  G'  and  that  y  and 
V  (the  other  end  of  e)  are  one  component,  too.  Since  x  and  y  are  in  different  components,  so 
are  u  and  v. 

(d)  We  claim  that  v  &  V  \s  &  cut  vertex  of  G  if  and  only  if  there  are  two  edges  e  and  e'  both 
containing  v  such  that  no  cycle  of  G  contains  both  e  and  e'. 

Proof.  Suppose  that  u  is  a  cut  vertex.  Let  x  and  y  belong  to  different  components  of  the 
graph  G"  induced  by  ^  —  {v}.  Any  path  from  a;  to  y  in  G  must  include  v.  Let  P  be  such  a 
path  and  let  e  and  e'  be  the  two  edges  in  P  that  contain  v.  If  e  and  e'  were  on  a  cycle  C  in 
G,  then  we  could  remove  e  and  e'  from  P  and  add  on  G  —  {e,  e'}  to  obtain  a  route  from  x 
to  y  that  does  not  go  through  v.  Since  this  contradicts  the  fact  that  x  and  y  are  in  different 
components  of  G",  it  follows  that  e  and  e'  do  not  lie  in  a  cycle. 

The  steps  can  be  reversed  to  prove  that  if  e  and  eJ  are  edges  incident  with  v  that  do  not 
lie  on  a  cycle,  then  i;  is  a  cut  vertex:  Let  x  and  y  be  the  other  vertices  on  e  and  e'.  Since  e  and 
e'  do  not  lie  on  a  cycle,  every  path  from  xioy  must  include  either  e  or  e'  (or  both),  and  hence 
includes  v.  Since  there  is  no  path  from  xto  y  not  including  v,  they  are  in  different  components 
of  G" . 

5.3.4.  The  definitions  of  connected  graphs  and  trees  would  result  in  the  same  structures. 

5.3.5  (a)   The  graph  is  not  Eulerian.  The  longest  trail  has  5  edges,  the  longest  circuit  has  4  edges. 

(b)  The  longest  trail  has  9  edges,  the  longest  circuit  has  8  edges. 

(c)  The  longest  trail  has  13  edges  (an  Eulerian  trail  starting  at  G  and  ending  at  D).  The  longest 
circuit  has  12  edges. 

(d)  This  graph  has  an  Eulerian  circuit  (12  edges). 

5.3.6  (a)    The  graph  is  Hamiltonian. 

(b)  The  graph  is  Hamiltonian. 

(c)  The  graph  is  not  Hamiltonian.  There  is  a  cycle  that  includes  all  vertices  except  K. 

(d)  The  graph  is  Hamiltonian. 
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Section  5.4 

5.4.1.  We  first  prove  that  (b)  and  (c)  are  equivalent.  We  do  this  by  showing  that  the  negation  of 

(b)  and  the  negation  of  (c)  are  equivalent.  Suppose  u  ^  v  are  on  a  cycle  of  G.  By  Theorem  5.3, 
there  are  two  paths  from  u  to  v.  Conversely,  suppose  there  are  two  paths  from  u  to  v.  Call  them 
u  =  xo,x\, . . .  ,Xk  =  V  and  u  =  yo,yi, . . .  ,ym  =  v.  Let  i  be  the  smallest  index  such  that  Xi  ^  j/j.  We 
may  assume  that  i  =  1  for,  if  not,  redefine  u  =  On  the  new  paths,  let  Xa  =  Vb  be  the  smallest 

a  >  0  for  which  some  Xj  is  on  the  y  path.  The  walk 

U  =  Xo,  Xi,...,Xa  =  yb,  2/6-1,  ...  ,2/0  =  w 

has  no  repeated  vertices  except  the  first  and  last  and  so  is  a  cycle.  (A  picture  may  help  you  visiualize 
what  is  going  on.  Draw  the  x  path  intersecting  the  y  path  several  times.) 

We  now  prove  that  (d)  implies  (b).  Suppose  that  G  has  a  cycle,  Vq,Vi,  . . .  ,Vh,Vo.  Remove  the 
edge  {vq,  Vk}-  In  any  walk  that  uses  that  edge,  replace  it  with  the  path  vo,vi, . . .  ,Vk  or  its  reverse, 
as  appropriate.  Thus  the  graph  is  still  connected  and  so  the  edge  {vo,Vk}  contradicts  (d). 

5.4.2  (a)   This  is  a  restatement  of  the  equivalence  of  (b)  and  (d)  in  the  theorem. 

(b)  This  was  done  in  the  last  part  of  the  proof  of  the  theorem. 

(c)  Again,  this  was  done  in  the  last  part  of  the  proof  of  the  theorem. 

5.4.3  (a)   By  Exercise  5.1.1,  we  have  Y^vev  ^(^)  =  ^l-E'l-  By  5.4(e),  \E\  =  \V\  -  1.  Since 

2\V\  =  we  have    2  =  2\V\-2\E\  =  ^(2-d(f)). 

vev  vev 

(b)  We  give  three  solutions.  The  first  uses  the  previous  result.  The  second  uses  the  fact  that  each 
tree  except  the  single  vertex  has  at  least  two  leaves.  The  third  uses  the  fact  that  trees  have  no 
cycles. 

Suppose  that  T  is  more  than  just  a  single  vertex.  Since  T  is  connected,  d{v)  ^  0  for  all  v. 
Let  rife  be  the  number  of  vertices  of  T  of  degree  k.  By  the  previous  result,  J2k>ii'^  ~  k)nk  =  2. 
Rearranging  gives  ni  =  2  +  X]fe>2(^  ~  2)nfe.  If       >  1,  the  sum  is  at  least  m  —  2. 

For  the  second  solution,  remove  the  vertex  of  degree  m  to  obtain  m  separate  trees.  Each 
tree  is  either  a  single  vertex,  which  is  a  leaf  of  the  original  tree,  or  has  at  least  two  leaves,  one 
of  which  must  be  a  leaf  of  the  original  tree. 

For  the  third  solution,  let  v  be  the  vertex  of  degree  m  and  let  {v,Xi}  be  the  edges 
containing  v.  Each  path  starting  v,Xi  must  eventually  reach  a  leaf  since  there  axe  no  cycles. 
Call  the  leaf  t/^.  These  leaves  are  distinct  since,  if  ?/,  =  yj,  the  walk  v,Xi, . . .  ,yi  =  yj, . . .  ,Xj,v 
would  lead  to  a  cycle. 

(c)  Let  the  vertices  be  u  and  Vi  for  1  <  i  <  m.  Let  the  edges  be  {u,Vi}  for  I  <  i  <  m. 

(d)  Let  =  n3  +  714  +  •  •  •,  the  number  of  vertices  of  degree  3  or  greater.  Note  that  fc  —  2  >  1 
for  fc  >  3.  By  our  earlier  formula,  rii  >  2  +  A^.  If  n2  =  0,  A^  =  |F|  —  ni  and  so  we  have 
ni>2+\V\~  ni.  Thus  ni  >  1  +  \V\/2.  Similarly,  if  n2  =  1,  AT  =  \V\  -  ni  -  1  and,  with  a  bit 
of  algebra,  m  >  (1+  |F|)/2. 

(e)  A  careful  analysis  of  the  previous  argument  shows  that  the  number  of  leaves  will  be  closest  to 
|y|/2  if  we  avoid  vertices  with  high  degrees.  Thus  we  will  try  to  make  our  vertices  of  degree 
three  or  less.  We  will  construct  some  RP-trees,  Tfe  with  fc  leaves.  Let  Ti  the  isolated  vertex.  For 
fc  >  1,  let  Tfe  have  two  children,  one  a  single  vertex  and  the  other  the  root  of  Tk-i.  Clearly  Tfe 
has  one  more  leaf  and  one  more  nonleaf  than  Tfe^i.  Thus  the  difference  between  the  number 
of  leaves  and  nonleaves  is  the  same  for  all  Tfe.  For  Ti  it  is  one. 
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5.4.4  (a)  Suppose  G  is  a  graph  with  v  vertices  and  v  edges.  By  Theorem  5.4(b,c),  the  graph  has  a 
cycle.  This  proves  the  base  case,  n  =  0.  Suppose  n  >  0  and  G  is  a  graph  with  v  vertices  and 
V  +  n  edges.  By  the  theorem  again,  we  know  that  the  graph  has  a  cycle.  By  the  proof  of  the 
theorem,  we  know  that  removing  an  edge  from  a  cycle  does  not  disconnect  the  graph.  However, 
removing  the  edge  destroys  any  cycles  that  contain  it.  Hence  the  new  graph  G'  contains  one 
less  edge  and  at  least  one  less  cycle  than  G.  By  the  induction  hypothesis,  G'  has  at  least  n 
cycles.  Thus  G  has  at  least  n  +  1  cycles. 

(b)  Let  G  be  a  graph  with  components  G\, . . .  ,Gk-  With  subscripts  denoting  components,  G,  has 
Vi  vertices,  Ci  —  Vi  +  rii  edges  and  at  least  n,  +  1  cycles.  From  the  last  two  formulas,  Gj  has  at 

least  1  -\-  Ci  —  Vi  cycles.  Now  sum  over  i. 

(c)  There  are  many  possibilities.  Here's  one  solution.  The  vertices  are  v  and,  for  0  <  «  <  n,  Xi  and 
Ui.  The  edges  are  {v,Xi),  {v,yi\,  and  {xi,yi).  (This  gives  n  +  1  triangles  joined  at  v.)  There 
are  1  +  2(n  +  1)  vertices,  3(n  +  1)  edges,  and  n  +  1  cycles. 

5.4.5.  Since  the  tree  has  at  least  3  vertices,  it  has  at  least  3  —  1  =  2  edges.  Let  e  =  {m,  u}  be  an 
edge.  Since  there  is  another  edge  and  a  tree  is  connected,  at  least  one  of  u  and  v  must  lie  on  another 
edge  besides  e.  Suppose  that  u  does.  It  is  fairly  easy  to  see  that  u  is  a  cut  vertex  and  that  e  is  a  cut 
edge. 

5.4.6  (a)   No  such  tree  exists.  A  tree  with  six  vertices  must  have  five  edges. 

(b)  No  such  tree  exists.  Such  a  tree  must  have  at  least  one  vertex  of  degree  three  or  more  and 

hence  at  least  three  vertices  of  degree  one. 

(c)  There  are  many;  for  example,  8  vertices  forming  a  cycle  and  two  vertices  of  degree  0. 

(d)  No  such  graph  exists.  If  it  did,  the  components  would  be  trees  since  there  are  no  cycles.  Consider 
the  case  of  two  trees.  Suppose  one  tree  has  v  vertices  and  the  other  12  —  v.  Since  trees  have 
one  fewer  edges  than  vertices,  the  total  number  of  edges  is 

{v-l)  +  {l2-v-l)  =  10 

For  more  than  two  trees  there  would  be  even  fewer  edges. 

(e)  A  tree  with  6  vertices  has  5  edges.  Since  the  sum  of  the  degrees  of  the  vertices  must  be  twice 
the  number  of  edges,  the  sum  of  the  degrees  must  be  10. 

(f)  Such  a  graph  must  have  at  least  1  +  e  —  ■i;  =  l  +  6  —  4  =  3  cycles. 

(g)  No  such  graph  exists.  If  the  graph  has  no  cycles,  then  each  component  is  a  tree.  In  such  a 
graph,  the  number  of  vertices  is  strictly  greater  than  the  number  of  edges. 

5.4.7  (a)  The  idea  is  that  for  a  rooted  planar  tree  of  height  h,  having  at  most  2  children  for  each 
non-leaf,  the  tree  with  the  most  leaves  occurs  when  each  non-leaf  vertex  has  exactly  2  children. 
You  should  sketch  some  cases  and  make  sure  you  understand  this  point.  For  this  case  /  =  2^ 
and  so  log2(/)  =  h.  Any  other  rooted  planar  tree  of  height  h,  having  most  2  children  for  each 
non-leaf,  is  a  subtree  (with  the  same  root)  of  this  maximal-leaf  binary  tree  and  thus  has  fewer 
leaves. 

(b)  The  height  h  can  be  arbitrarily  large. 

(c)  h  =  l-l. 

(d)  [log2(Z)]  is  a  lower  bound  for  the  height  of  any  binary  tree  with  /  leaves.  It  is  easy  to  see  that 
you  can  construct  a  full  binary  tree  with  I  leaves  and  height  [log2(Z)]. 

(e)  [log2(Z)]  is  the  minimal  height  of  a  binary  tree. 
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5.4.8.  We'll  give  four  proofs.  The  case  n  =  1  is  trivial  since  the  tree  is  just  •  in  this  case. 

(i)  By  Exercise  5.4.3,  X]t)ey(2  ^  d.{v))  —  2.  Since  there  is  more  than  one  leaf,  the  root  r  is  not 
a  leaf  and  so  d{r)  =  2.  For  a  leaf  u,  d{u).  For  any  other  vertex  w,  d{'w)  =  3.  Let  there  be  m 
nonleaf  vertices.  We  have 

2  =  ^{2-d{v))  =  (2-2)  +  n(2- l)  +  m(2-3)  =  n-m. 

vev 

Hence  m  =  n  —  2.  The  total  number  of  vertices  is  1  +  n  +  m  =  2n  —  1. 

(ii)  We  give  a  proof  by  induction  on  the  number  of  leaves.  The  case  n  =  1  was  done.  Otherwise,  the 
binary  tree  consists  of  a  root  joined  to  two  other  binary  trees.  Let  the  left  tree  have  k  leaves. 
Then  the  right  has  n  —  k.  By  induction,  the  left  tree  has  2k  —  1  vertices  and  the  right  has 
2(n  —  fc)  —  1.  Adding  these  together  and  adding  1  for  the  root  gives  a  total  of  2n  —  1  vertices. 

(iii)  Every  non-leaf  vertex  has  two  children.  Thus  the  total  number  of  vertices  equals  1  (for  the 
root)  plus  twice  the  number  of  vertices  which  are  not  leaves.  If  there  are  v  vertices,  v  —  n  are 
not  leaves  and  so  i;  =  1  +  2(i>  —  n).  Solving,  we  get  v  =  2n  —  1. 

(iv)  The  number  of  edges  is  twice  the  number  of  nonleaf  vertices.  If  there  are  v  vertices  and  e 
edges,  then  e  =  2{v  —  n).  By  Theorem  5.4(e),  e  =  v  —  1.  Thus  v  —  1  =  2{v  —  n).  Solving,  we 

get  V  —  2n  —  1. 

5.4.9  (a)    A  binary  tree  with  35  leaves  and  height  100  is  possible. 

(b)  A  full  binary  tree  with  21  leaves  can  have  height  at  most  20.  So  such  a  tree  of  height  21  is 
impossible. 

(c)  A  binary  tree  of  height  5  can  have  at  most  32  leaves.  So  one  with  33  leaves  is  impossible. 

(d)  A  full  binary  tree  with  65  leaves  has  minimal  height  [log2(65)]  =  7.  Thus  a  full  binary  tree 
with  65  leaves  and  height  6  is  impossible. 

5.4.10.  The  maximal  number  of  vertices  is  1  +  fc  +  fc^ +  •••  fc'*  =  (fc''"'"^  —  l)/(fc— 1).  The  max;imal 

number  of  leaves  is  k^. 

5.4.11  (a)    Breadth-first:  MIAJKCEHLBFGD, 

Depth-first:  MICIEIHFHGHDHIMAMJMKLKBKM, 

Pre-order:  MICEHFGDAJKLB, 
Post-order:  CEFGDHIAJLBKM . 

(b)  The  tree  is  the  same  as  in  part  (a),  reflected  about  the  vertical  axis,  with  vertices  A  and  J 
removed. 

(c)  It  is  not  possible  to  reconstruct  a  rooted  plane  tree  given  just  its  pre-order  vertex  list.  A 
counterexample  can  be  found  using  just  three  vertices. 

(d)  It  is  possible  to  reconstruct  a  rooted  plane  tree  given  its  pre-order  and  post-order  vertex  list. 
If  the  root  is  X  and  the  first  child  of  the  root  is  Y ,  it  is  possible  to  reconstruct  the  pre-order 
and  post-order  vertex  lists  of  the  subtree  rooted  at  Y  from  the  pre-order  and  post-order  vertex 
lists  of  the  tree.  In  the  same  manner,  you  can  reconstruct  the  pre-order  and  post-order  vertex 
lists  of  the  subtrees  rooted  at  the  other  children  of  the  root  X.  Now  do  the  same  trick  on  these 
subtrees.  Try  this  approach  on  an  example. 

5.4.12.  The  statement  of  the  exercnse  associates  a  Priifer  sequence  with  every  tree.  To  prove  that 
this  is  a  bijection,  we  must  show  that  it  is  an  injection  and  a  surjection.  If  we  use  the  fact  that  there 
are  n"~^  trees,  we  do  not  need  to  do  both:  A  function  f  :  A  ^  B  with  \A\  =  \B\  is  a  surjection  if 
and  only  if  it  is  an  injection.  On  the  other  hand,  if  we  want  to  prove  that  there  are  n"~^  trees,  we 
need  to  do  both. 
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Note  first  that  it  does  not  matter  what  labels  are  used  for  the  vertices  as  long  as  the  labels  are 
ordered  so  wc  can  find  the  largest.  Suppose  the  hint  has  been  proved. 

The  proof  of  the  bijection  is  by  induction  on  the  number  of  vertices.  Given  a  Priifer  sequence, 
we  can  determine  the  first  vertex  removed.  Thus  we  know  which  vertices  are  left.  Furthermore,  the 
Priifer  sequence  for  this  (n  —  l)-vertex  tree  is  the  original  sequence  with  the  first  entry  removed. 
Using  this  you  should  be  able  to  prove  inductively  that  every  Priifer  sequence  gives  a  unique  tree. 
Thus  the  function  from  trees  to  Priifer  sequences  is  a  bijection. 

We  now  prove  the  suggestion  in  the  hint.  Clearly  the  first  vertex  removed  cannot  be  in  the 
Priifer  sequence.  Consider  a  vertex  not  in  the  Priifer  sequence.  Since  it  does  not  appear  in  the 
sequence,  no  vertex  attached  to  it  was  ever  removed.  Hence  it  must  be  a  leaf.  Since  the  largest  leaf 
is  removed,  we  are  done. 

Section  5.5 

5.5.1.  Let  D  be  the  domain  suggested  in  the  hint  and  define  f:D^  'P2{V)  by  f{{x,y))  =  {x,y}. 

Let  GiD)  =  {V,ip)  where  ijj{e)  =  f{ip{e)). 

5.5.2.  In  each  case,  it  is  a  matter  of  choosing  subsets  oi  V  x  V  for  the  directed  edges. 

(a)  There  are  x  y|  potential  edges  to  choose  from.  Since  there  are  two  choices  for  each  edge 
(either  in  the  digraph  or  not),  we  get  2"  simple  digraphs. 

(b)  With  loops  forbidden,  our  possible  edges  include  all  elements  ofVxV  except  those  of  the  form 

(v.  v)  with  V  G  V.  Thus  there  are  2"("^-'-)  loopless  simple  digraphs.  An  alternative  derivation 
is  to  note  that  a  simple  graph  has  (2)  edges  and  we  have  4  possible  choices  in  constructing  a 
digraph:  (i)  omit  the  edge,  (ii)  include  the  edge  directed  one  way,  (iii)  include  the  edge  directed 

the  other  way,  and  (iv)  include  two  egdes,  one  directed  each  way.  This  gives 

4(2)  =  2"("-i). 

The  latter  approach  is  not  useful  in  doing  part  (c). 

(c)  Given  the  set  S  of  possible  edges,  we  want  to  choose  q  of  them.  This  can  be  done  in  ('^')  ways. 
In  the  general  case,  the  number  is      )  and  in  the  loopless  case  it  is  ("^"g"^'') 

5.5.3.  Let  V  =  {u,  v}  and  E  =  {{u,  v),  {v,  u)}. 

5.5.4.  For  each  {u,v}  €  V2{V)  we  have  three  choices:  select  the  edge  {u,v),  select  the  edge  {v,u) 
or  have  no  edge  between  u  and  v.  Let  N  =  \V2{V)  =  (^). 

(a)  There  are  3^  oriented  simple  graphs. 

(b)  We  can  choose  q  elements  of  V2{V)  and  then  orient  each  of  them  in  one  of  two  ways.  This 
gives  us  (^)29. 

5.5.5.  You  can  use  the  notation  and  proof  of  Example  5.5  provided  you  change  all  referenc;es  to 
two  element  sets  to  references  to  ordered  pairs.  This  means  replacing  {x,  y}  with  {x,  y),  {i'{x),i'{y)} 
with  {iy{x),v{y))  and  7'2(T^i)  with      x  V,. 

5.5.6  (a)  If  x,y  G  V  and  x  ^  y,  there  must  be  a  directed  pat  from  a;  to  y  in  D  since  D  is  strongly 
connected.  In  S{D),  this  becomes  a  walk  from  x  to  y.  Hence  S{D)  is  connected. 

(b)  Here's  the  simplest  solution:  V  =  {x,y}  and  E  =  {{x,y)}. 

(c)  Let  Ml  e  Vi  and  U2  G  V2.  Since  D  is  strongly  connected,  there  is  a  directed  path  from  m  to  U2- 
This  path  must  somehow  cross  from  V\  to  V2  and  so  there  is  an  edge  from  Vi  to  V2.  Similarly, 
there's  an  edge  from  V2  to  Vi. 

If  you  prefer,  here's  a  more  formal  proof.  Let  vi  be  the  last  vertex  on  the  path  that  is  in 
Vi.  Since  U2  ^V\,  Vi  is  not  the  end  of  the  path.  Let  V2  be  the  next  vertex  on  the  path  after  vi. 
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By  the  definition  of  vi,  we  have  V2  0  Vi  and  so  V2  €  V2.  Since  vi,V2)  is  an  edge  of  D,  there  is 
an  edge  from  Vi  to  V2.  Interchanging  the  roles  of  Vi  and  V2  proves  that  there  is  also  an  edge 
from  V2  to  Vi. 

(d)    Here  are  two  difi'erent  proofs. 

First  proof.  Let  u  be  a  vertex  and  let  Vi  contain  all  vertices  that  can  be  reached  from  u.  Let 

V2  be  the  remaining  vertices.  If  V2  =  0,  wc  arc  done.  Wc  now  suppose  V2  0  and  obtain  a 
contradiction.  Since  D  is  2-way  joined,  there  is  an  edge  (^1,^2)  with  Wi  £  V\  and  W2  €  V2. 
By  the  definition  of  Vi,  either  wi  =  m  or  there  is  a  directed  path  from  u\,o  w\.  Since  (wi,  W2) 
is  an  edge,  it  follows  that  there  is  a  directed  path  from  uto  W2.  This  contradicts  the  definition 
of  V2  as  those  those  vertices  which  cannot  be  reached  from  u.  It  follows  that  the  assumption 
1/2  ^  0  is  false. 

Second  proof.  Suppose  that  D  is  not  strongly  connected.  Wc  will  find  a  partition  that  is 
not  2-way  joined.  That  will  prove  the  result  by  proving  the  contrapositive.  By  assumption, 
there  must  be  vertices  v\  and  V2  in  D  such  that  there  is  no  directed  path  from  v\  to  V2-  Let 
Vi  contain  v\  and  all  vertices  that  can  be  reached  from  v\  via  directed  paths.  Let  V2  be  the 
remaining  vertices.  Since  1)2  G  V2,  we  have  a  partition  of  V .  If  (y^vS)  were  an  edge  from  V\  to 
V2,  the  path  from  v\  to  v  followed  by  the  edge  {y,w)  would  give  us  a  contradiction.  Since  no 
such  edge  {v,  w)  can  exist,  the  partition  is  not  2-way  joined. 

5.5.7.  "The  statements  arc  all  equivalent"  means  that,  given  any  two  statements  v  and  w,  we 
have  a  proof  that  v  implies  w.  Suppose  D  is  strongly  connected.  Then  there  is  a  directed  path 
V  =  vi,V2,  ■  ■  ■  ,Vk  =  w.  That  means  we  have  proved  vi  implies  V2,  that  V2  implies  W3  and  so  on. 
Hence  vi  implies  Vk- 

5.5.8  (a)   The  value  of  din{{v})  is  the  number  of  edges  that  have  their  "heads"  at  v. 

(b)  Both  sums  equal  the  number  of  edges  in  D. 

(c)  Suppose  u  gU.  Every  edge  {v,u)  contributes  1  to  din{{v})  but  it  contributes  1  to  din{U)  only 
when  V  ^  U.  Hence  J2ueu  ^in{{u})  exceeds  din{U)  by  the  number  of  edges  {v,u)  with  v  £  U 
and  u  G  U,  which  is  what  we  were  asked  to  prove. 

(d)  There  is  a  result  like  (c)  for  dout-  Let  e{U)  be  the  number  of  edges  in  D  that  have  both  their 
end  points  in  U.  We  have 

din{U)   =    5^  rfin(M)  -  e([/) 

=  ^  dout(M)  -  e(«7)  =  doutiU). 
ueu 

5.5.9.  Let  e  =  (ui,U2).  For  i  =  2,3, . . .,  as  long  as  Ui  ^  Ui  choose  an  edge  that  has  not 
be  used  so  far.  It  is  not  hard  to  see  that  rfin(wi)  =  dout{ui)  implies  this  can  be  done.  In  this  way  we 

obtain  a  directed  trail  starting  and  ending  at  ui.  This  may  not  be  a  cycle,  but  a  cycle  cotaining  e 

can  be  extracted  from  it  by  deleting  some  edges. 

5.5.10.  Use  the  idea  in  the  previous  exercise  plus  induction  on  the  number  of  edges  to  partition  the 
edges  of  D  into  directed  trails  with  the  starting  vertex  of  each  trail  equal  to  its  final  vertex.  We  will 
prove  that  if  there  is  more  than  one  trail,  then  two  of  the  trails  can  be  combined  into  a  single  trail. 
It  follows  that  we  may  assume  there  is  only  one  trail.  To  prove  our  combining  claim,  note  that  since 
S{D)  is  connected,  there  must  be  two  trails  that  have  a  vertex  in  common.  It  is  not  hard  to  see  how 
to  join  them  into  one  directed  trail. 

5.5.11  (a) 

(b)   See  Exercise  6.3.14  (p.  166). 
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5.5.12  (a)    There  are  2"  ~"  reflexive  binary  relation  on  a  set  of  n  elements. 

(b)  There  are  2*^"  reflexive  and  symmetric  relations  i?  on  a  set  of  n  elements. 

(c)  There  are  2^"  unreflexive  and  symmetric  relations  B  on  a  set  of  n  elements. 

(d)  There  are  2*^"  +")/2  symmetric  relations  and  2^"  -")/2  reflexive  and  symmetric  relations.  Take 
the  difference  and  you  get  2("'-")/2(2"  -  1). 

5.5.13  (a)  For  all  x  G  S,  x\x.  For  all  x,y  G  S,  if  x\y  and  x  ^  y,  then  y  does  not  divide  x.  For  all 
x,y,z  G  S,  x\y,  y\z  implies  that  x\z. 

(b)    The  covering  relation 

H  =  {(2,4),  (2, 6),  (2, 10),  (2, 14),  (3,6),  (3,9),  (3, 15),  (4,8),  (4, 12),  (5, 10),  (5, 15),  (6, 12),  (7, 14)}. 

5.5.14.  The  transitive  closure  of  H  is  the  divides  relation  on 
S  =  {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}. 

5.5.15  (a)   There  are  n"~^  trees.  Since  a  tree  with  n  vertices  has  n  —  1  edges,  the  answer  is  zero  if 
q  ^  n—l.Uq  =  n—1,  there  are         graphs.  Thus  the  answer  is  n"~^ in-i)     when  q  =  n—1. 
(b)   We  have 

\n-l)        (n-1)!  ~     2"-i  (n-1)!     ^   2»-Ve"-i   "  vYJ 
Using  this  in  the  answer  to  (a)  gives  the  result  we  want.  It  turns  out  that 

which  differs  from  our  estimate  by  a  constant  times  n^^^. 

Section  5.6 

5.6.1.  Let  A  and  B  be  the  partition  of  the  vertices  guaranteed  by  the  deflnition  of  a  bipartite  graph. 

Let  k  =  \A\,  number  the  vertices  in  A  with  1  to  fc  and  those  is  B  with  fc  +  1  to  n.  Since  no  edges 
connect  vertices  in  A  to  each  other,  A{G)  has  a.kxk  block  of  zeroes  in  its  upper  left  corner.  Similarly 
B  gives  a  block  in  the  lower  right  corner. 

5.6.2.  Since  there  are  no  edges  between  connected  components  of  G,  we  have  lots  of  zeroes,  more 
specifically,  A{G)  is  a  matrix  of  zeroes  except  for  blocks  along  the  diagonal.  The  iih  block  is  the 

n,:  X  n,;  matrix  A(Gi). 

5.6.3  (a)  a\'^j  is  the  sum  over  all  ti, . . . ,  tk-i  of  ai^tidtiM  ' ' '  o-tk-i.j-  Each  of  these  products  is  0  or 
1,  so  the  sum  is  nonzero  if  and  only  if  some  product  is  nonzero.  This  happens  if  and  only  if 
each  factor  in  the  product  is  nonzero.  This  happens  if  and  only  if  the  vertices  i,ti,. . . ,  t^-i,  j 
form  a  walk. 

(b)  We  can  construct  a  path  from  a  walk  by  jumping  over  pieces  that  form  cycles.  Thus  the  shortest 
walk  from  i  to  j  is  a  path.  Here's  a  more  formal  argument.  Suppose  that  W  =  {i,t, . . . ,  v,j)  is 
the  shortest  walk  from  i  to  j.  If  it  is  not  a  path,  then  there  must  be  repeated  vertices  in  the 
list.  Let  u  be  such  a  vertex.  Remove  all  vertices  from  the  sequence  after  the  first  occurence  of 


Solutions  Manual  41 


u  up  to  and  including  the  last  occurence  of  u.  The  result  is  a  shorter  walk,  contradicting  the 
minimality  of  W. 

(c)  The  obvious  idea  is  to  repeat  the  previous  statement  with  i  =  j:  "The  shortest  walk  from  i  to 
i  is  a  cycle."  This  is  not  true.  If  is  an  edge,  then  i  is  the  shortest  walk  from  i  to  j  but 
it  is  not  a  cycle.  The  result  would  be  true  if  we  were  looking  at  oriented  simple  graphs  because 
an  edge  can  be  traversed  in  only  one  direction.  All  we  can  claim  is  that  any  odd  length  walk 
from  i  to  i  contains  a  cycle. 

We  can  modify  the  situation  a  bit  by  looking  at  an  edge  {i,j}  of  the  graph.  Let  H  be 
the  graph  obtained  by  removing  it;  i.e.,  by  setting  j  =  ajj  =  0.  The  shortest  walk  from  j  to 
i  in  H  together  with  the  edge  {i,j}  is  a  cycle  of  G.  This  follows  from  the  previous  result  and 
the  definitions  of  path  and  cycle. 

(d)  Following  the  hint,  B''  =  X^JLq  (t)-B*  by  the  binomial  theorem.  Since  (j)  >  0,  6-^  is  nonzero 
if  and  only  if  a-*]  ^  0  for  some  t  with  0<t<k.  t  =  0  gives  the  identity  matrix,  so  6^^^  ^  0  for 

(k) 

all  k.  For  i  ^  j,  bi  ■  p  if  and  only  if  there  is  a  walk  from  i  to  j  for  some  t  <  k,  and  thus  if  and 
only  if  there  is  a  path  for  some  t  <  k.  Since  paths  of  length  t  contain  t  +  1  distinct  vertices, 
no  path  is  longer  than  n—1.  Thus  there  is  a  path  from  i  to  j  ^  i  if  and  only  if  b^''^  ^  0  for  all 
k>n-l. 

5.6.4.  The  arguments  given  for  simple  graphs  carry  over.  Nothing  can  be  said  about  cycles  for 
simple  directed  graphs. 

5.6.5.  We  claim  that  A{D)  is  nilpotent  if  and  only  if  there  is  no  vertex  i  such  that  there  is  a  walk 
from  i  to  i  (except  the  trivial  walk  consisting  of  just  i). 

First  sTippose  that  there  is  a  nontrivial  walk  from  i  to  z  containing  k  edges.  Let  C  —  A{D)'^.  It 
follows  that  all  entries  of  C  are  nonnegative  and  Cj^j  ^  0.  Thus  c^™^  ^  0  for  all  m  >  0.  Hence  A{D) 
is  not  nilpotent. 

Conversely,  suppose  that  A{D)  is  not  nilpotent.  Let  n  be  the  number  of  vertices  in  D  and 
suppose  that  i  and  j  are  such  that  a^^j  ^  0,  which  we  can  do  since  A{D)  is  not  nilpotent.  There 
must  be  a  walk  i  —  vo,Vi,V2,  ■  ■  ■  ,Vn  =  j-  Since  this  sequence  contains  n  +  I  vertices,  there  must  be 
a  repeated  vertex.  Suppose  that  k  <  I  and  Vk  =  vi.  The  sequence  Vk,  Vk+i, . . .  ,vi  is  a,  nontrivial  walk 
from  Vk  to  itself. 

Section  6.1 

6.1.1  (a)  One  description  of  a  tree  is:  a  connected  graph  such  that  removal  of  any  edge  disconnects 
the  tree.  Since  an  edge  connects  only  two  vertices,  we  will  obtain  only  two  components  by 
removing  it. 

(b)  Note  that  T  with  e  removed  and  /  added  is  a  spanning  tree.  Since  T  has  minimum  weight,  the 

result  follows. 

(c)  The  graph  must  have  a  cycle  containing  e.  Since  one  end  of  e  is  in  Ti  and  the  other  in  T2,  the 
cycle  must  contain  another  connector  besides  e. 

(d)  Since  T*  with  e  removed  and  /  added  is  a  spanning  tree,  the  algorithm  would  have  removed  / 
instead  of  e  if  A(/)  >  A(e). 

(e)  By  (b)  and  (d),  A(/)  =  A(e).  Since  adding  /  connects  Ti  and  T2,  the  result  is  a  spanning  tree. 

(f)  Suppose  T*  is  not  a  minimum  weight  spanning  tree.  Let  T  be  a  minimum  weight  spanning  tree 
so  that  the  event  in  (a)  occurs  as  late  as  possible.  It  was  proven  in  (e)  that  we  can  replace  T 
with  another  minimum  weight  spanning  tree  such  that  the  disagreement  between  T  and  T*,  if 
any,  occurs  later  in  the  algorithm.  This  contradicts  the  definition  of  T. 
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6.1.2.  Note  that  Bi  and  B2  cannot  have  any  edges  in  common  since  they  are  different  equivalence 

classes  of  edges.  Suppose  that  u  and  v  are  vertices  in  BiC]B2.  We  will  derive  a  contradiction  by  finding 
a  cycle  containing  edges  of  Bi  and  edges  of  B2.  There  exist  vertices  Wi^x  such  that  =  {u,Wi^i}  is 
an  edge  of  Bj.  There  also  exist  edges  /j  in  Bj  that  have  v  as  an  end  vertex.  Since  ei  fi,  there  is  a 
cycle  in  Bi  containing  and  fi.  Let  the  vertices  on  the  cycle  in  Bi  be  u,  w^^i,  Wi.2, . . .  ,v, . . ..  Let  x 
be  the  first  vertex  in  the  Bi  cycle  other  that  u  which  lies  in  the  B2  cycle.  (There  must  be  one  since 
both  cycles  contain  v.)  If  x  =  wi^i  =  W2,i,  we  are  done  since  then  ei  =  62  is  in  both  Bi  and  B2,  a 
contradiction.  Then 

M,  ■■■,X,..  .  ,W2J,W2J-1,  ■  ■  -,102,1, V 

is  a  cycle  containing  ei  and  62.  Thus  ei  ^  62,  a  contradiction. 

6.1.3  (b)  Let  Qi  and  Q2  be  two  bicomponents  of  G,  let  t;i  be  a  vertex  of  Qi,  and  let  V2  be  a  vertex 
of  Q2-  Since  G  is  connected,  there  is  a  path  in  G  from  vi  to  V2,  say  cci, . . .  ,Xp.  You  should 
convince  yourself  that  the  following  pseudocode  constructs  a  walk  wi,W2,  -  ■  ■  in  B{G)  from  Qi 
to  Q2- 

Set  t«i  =  Qi,  j=2,  and  k  =  0. 

While  there  is  an  .Xj  G  P{G)  with  i>  k. 

Let  i  >  fc  be  the  least  i  for  which      G  P{G). 

If  i  =  p 

Set  Q  =  Q2- 

Else 

Let  Q  be  the  bicomponent  containing  {xi,Xi+i}. 
End  if 

Set  Wj  =  Xi ,  Wj^i  =  Q ,  k  =  i,  and  j  =  j  +  2. 
End  while 

(c)  Suppose  there  is  a  cycle  in  B{G),  say  wi,  Qi, . . . ,  Wfc,  Qfc,  f  1,  where  the  Qi  are  distinct  bicompo- 
nents and  the  Vi  are  distinct  vertices.  Set  v^j^i  =  Vi.  By  the  definitions,  there  is  a  path  in  Qi 
from  Vi  to  Vij^i.  Replace  each  Qi  in  the  previous  cycle  with  these  paths  after  removing  the  end- 
points  Vi  and  iij+i  from  the  paths.  The  result  is  a  cycle  in  G.  Since  this  is  a  cycle,  all  vertices 
on  it  lie  in  the  same  bicomponent,  which  is  a  contradiction  since  the  original  cycle  contained 
more  than  one  Qi. 

(d)  Let  t;  be  an  articulation  point  of  the  simple  graph  G.  By  definition,  there  are  vertices  x  and 
y  such  that  every  path  from  x  to  y  contains  v.  From  this  one  can  prove  that  there  are  edges 
e  =  {v,  x'}  and  /  =  {v,  y'}  such  that  every  path  from  x'  to  y'  contains  v.  It  follows  that  e  and 
/  are  in  different  bicomponents.  Thus  v  lies  in  more  than  one  bicomponent. 

Suppose  that  v  lies  in  two  bicomponents.  There  are  edges  e  =  {v,  w}  and  /  =  {v,  z}  such 
that  e  7^  /.  It  follows  that  every  path  from  w  to  z  contains  v  and  so  v  is  an  articulation  point. 

6.1.4.  When  we  removed  e  from  the  minimum  weight  spanning  tree  and  added  /,  the  result  was 
still  a  minimum  weight  spanning  tree.  It  follows  that  A(e)  =  A(/),  which  contradicts  e  7^  /. 

6.1.5  (a)  Since  there  are  no  cycles,  each  component  must  be  a  tree.  If  a  component  has  rij  vertices, 
then  it  has  rij  —  1  edges  since  it  is  a  tree.  Since  ^  Ui  over  all  components  is  n  and  X^(n,  —  1) 
over  all  components  is  k,  n  —  fc  is  the  number  of  components. 

(b)  By  the  previous  part,  Hk+i  has  one  less  component  than  Gk  does.  Thus  at  least  one  component 
C  of  Hk+i  has  vertices  from  two  or  more  components  of  Gk-  By  the  connectivity  of  C,  there 
must  be  an  edge  e  of  C  that  joins  vertices  from  different  components  of  Gk-  If  this  edge  is 
added  to  G^,  no  cycles  arise. 
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(c)  By  the  definition  of  the  algorithm,  it  is  clear  that  X{gi)  <  A((3i).  Suppose  that  X{gi)  <  A(ei) 
for  I  <  i  <  k.  By  the  previous  part,  there  is  some  Cj  with  1  <  <  fc  +  1  such  that  Gk  together 
with  ej  has  no  cycles.  By  the  definition  of  the  algorithm,  it  follows  that  X{gk+i)  <  X{ej).  Since 
X{ej)  <  A(efe+i)  by  the  definition  of  the  e^'s,  we  are  done. 

6.1.6.  In  the  notation  of  the  previous  exercise,  the  edges  belong  to  a  minimum  weight  spanning 
tree  if  and  only  if  X{gi)  =  A(e,)  for  1  <  i  <  n.  Since  A  is  an  injection,  it  follows  that  gi  =  e,  for 
1  <i  <n. 

6.1.7  (a)    Hint:  For  (1)  there  are  four  spanning  trees.  For  (2)  there  are  8  spanning  trees.  For  (3) 

there  are  16  spanning  trees. 

(b)  Hint:  For  (1)  there  is  one.  For  (2)  there  are  two.  For  (3)  there  are  two. 

(c)  Hint:  For  (1)  there  are  two.  For  (2)  there  are  four.  For  (3)  there  are  6. 

(d)  Hint:  For  (1)  there  are  two.  For  (2)  there  are  three.  For  (3)  there  are  6. 

6.1.8  (a)   Hint:  For  (1)  there  are  three  minimal  spanning  trees.  For  (2)  there  are  2  spanning  trees. 

For  (3)  there  is  1  minimal  spanning  tree. 

(b)  Hint:  For  (1)  there  is  one.  For  (2)  there  are  two.  For  (3)  there  is  one. 

(c)  Hint:  (1)  there  is  one.  For  (2)  there  is  one.  For  (3)  there  are  four. 

(d)  Hint:  For  (1)  there  is  one.  For  (2)  there  is  one.  For  (3)  there  are  four. 

6.1.9  (a)   Hint:  There  are  21  vertices,  so  the  minimal  spanning  tree  has  20  edges.  Its  weight  is  30. 

(b)  Hint:  Its  weight  is  30.. 

(c)  Hint:  Its  weight  is  30. 

(d)  Hint:  Note  that  K  is  a,  the  only  vertex  in  common  to  the  two  bicomponents  of  this  graph. 
Whenever  this  happens  (two  bicomponents,  common  vertex),  the  depth-first  spanning  tree 
rooted  at  that  common  vertex  has  exactly  two  "principal  subtrees"  at  the  root.  In  other  words, 
the  root  of  the  depth-first  spanning  tree  has  degree  two.  Finding  depth  first  spanning  trees  of 
minimal  weight  is,  in  general,  difficult.  You  might  try  it  on  this  example. 

Section  6.2 

6.2.1.  This  is  just  a  matter  of  a  little  algebra. 

6.2.2.  The  answer  is  x{x  —  1)"^^  for  a  tree  with  n  vertices.  We'll  give  a  three  methods. 

First  method.  Imagine  the  tree  being  rooted  and  color  it  working  from  the  root.  The  root  can 
be  colored  in  x  ways.  When  we  reach  another  vertex,  its  parent  is  the  only  vertex  adjacent  to  it 
which  has  been  colored,  so  it  can  be  colored  in  a;  —  1  ways. 

Second  method.  We'll  use  induction  on  the  number  of  vertices..  One  vertex  is  trivial.  For  more 
than  one  vertex,  the  tree  T  must  have  a  leaf  v.  By  induction,  T  —  v  can  be  colored  in  x{x  — 
ways.  By  definition,  a  leaf  is  a  vertex  which  is  joined  to  the  rest  of  the  tree  by  just  one  edge  thus  v 
can  be  colored  in  a;  —  1  ways. 

Third  method.  We'll  use  induction  on  the  number  of  vertices..  One  and  two  vertices  are  trivial. 
If  there  are  more  than  two  vertices  in  the  tree  T,  it  has  a  vertex  v  of  degree  greater  than  one.  We 
can  split  T  into  two  trees  H  and  K  wliidi  share  only  the  vertex  v  and  which  each  have  less  leaves 
than  T.  By  Exercise  6.2.3  below,  the  result  follows. 
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6.2.3  (a)  To  color  G,  first  color  the  vertices  of  H  AND  then  color  the  vertices  of  K.  By  the  Rule 
of  Product,  Pg{x)  =  Ph{x)Pk{x). 

(b)  Let  V  be  the  conimon  vertex.  There  is  an  obvious  bijection  between  pairs  of  colorings  (Xh,  Xk) 
of  H  and  K  with  Xh{v)  =  Xk{v)  and  colorings  of  G.  We  claim  the  number  of  such  pairs  is 
Ph{x){Pk{x)/x).  To  see  this,  note  that,  in  the  colorings  of  K  counted  by  Pk{x),  each  of  the 
X  ways  to  color  v  occurs  equally  often  and  so  1/a;  of  the  colorings  will  have  Xk{v)  equal  to  the 

color  given  by  Xh{v). 

(c)  The  answer  is  Ph{x)Pk{x){x  —  l)/x.  We  can  prove  this  directly,  but  we  can  also  use  (b)  and 
(6.4)  as  follows.  Let  e  =  {v,w}.  By  the  construction  of  G,  Po-eix)  =  Ph{x)Pk{x).  By  (b), 
PgM  =  Ph{x)Pk{x)/x.  Now  apply  (6.4). 

6.2.4.  By  Exercise  6.2.3,  Pg{x)  =  Pzk+,{x)Pz„.,+Ax)/x{x  -  1). 

6.2.5.  Let  the  solution  be  Pn{x).  Clearly  Pi{x)  =  x{x  —  1),  so  we  may  suppose  that  n  >  2.  Apply 
deletion  and  contraction  to  the  edge  {(1, 1),  (1, 2)}.  Deletion  gives  a  ladder  with  two  ends  sticking  out 
and  so  its  chromatic  polynomial  is  {x  —  l)^P„_i(a;).  Contraction  gives  a  ladder  with  the  contracted 
vertex  joined  to  two  adjacent  vertices.  Once  the  ladder  is  colored,  there  are  x  —  2  ways  to  color  the 
contracted  vertex.  Thus  we  have 

Pn{x)    =    {x-lfPn-l{x)-{x-2)Pn-i{x)    =    {x^  -  3x  +  3)Pn-l{x)  ■ 

The  value  for  Pn{x)  now  follows  easily. 

6.2.6.  Use  deletion  and  contraction  on  the  edge  {(1,  2).  (2, 2)}  and  then  on  the  edge  {(2,  2).  (3, 2)}. 
Two  contractions  give  two  easy  graphs  that  have  a  common  vertex.  A  contraction  and  a  deletion 
in  either  order  gives  Zq  with  two  vertices  joined.  By  coloring  Zq  and  then  those  two,  you  get  the 
chromatic  polynomial  for  Zf,  times  (.x  — 2)^.  After  two  deletions,  use  the  edge  {(2, 1),  (2, 2)}.  Deletion 
gives  Zg  with  a  vertex  joined  to  it  by  a  single  edge.  Contraction  gives  two  copies  of  Z^  sharing  an 
edge.  All  the  graphs  we  have  obtained  have  chromatic  polynomials  that  are  easy  to  compute  by 
previous  results. 

6.2.7.  The  answer  is 

a;^  -  12a;^  +  660;*^  -  214a;^  +  441a;*  -  572a;^  +  A23x^  -  133a;. 

There  seems  to  be  no  really  easy  way  to  derive  this.  Here's  one  approach  which  makes  use  of 
Exercise  6.2.3  and  Pz„{x)  for  n  =  3,4,5.  Label  the  vertices  reading  around  one  face  with  a,b,c,d 
and  around  the  opposite  face  with  A,B,C,D  so  that  {a,  A}  is  an  edge,  etc.  If  the  edge  {a.  A}  is 
contracted,  call  the  new  vertex  a.  Introduce  /3,  7  and  6  similarly. 

Let  ei  =  {a,^!}  and  e2  =  {b,B}.  Note  that  G  —  ei  —  62  consists  of  three  squares  joined  by 
common  edges  and  that  H  =  Gg^  —  62  is  equivalent  to  (G  —  ei)e2-  We  do  H  in  the  next  paragraph. 
In  K  =  Geie2)  let  /  =  {a,  /?}.  K  —  f  is  two  triangles  and  a  square  joined  by  common  edges  and  Kf 
is  a  square  an  a  vertex  v  joined  to  the  vertices  of  the  square.  By  first  coloring  v  and  then  the  square, 
we  see  that  PKf{x)  =  xPzi{x  —  1). 

Let  /i  =  {c,G},  /2  =  {d,D}  and  /s  =  {/?,7}-  Then 

•  -ff  —  /i  —  /2  is  two  Z5S  sharing  (3; 

•  {H  —  /i)  /2  is  easy  to  do  if  you  consider  two  cases  depending  on  whether  (3  and  S  have  the  same 
or  different  colors,  giving  a;(a;  —  l)(a;  —  2)'*  +  x{x  —  1)*; 

•  Hf^  —  /s  is  a  Z^  and  a  triangle  with  a  common  edge  and 

•  are  three  triangles  joined  by  common  edges. 

6.2.8.  A  term  of  the  sum  on  the  right  hand  side  of  (6.5)  counts  the  number  of  functions  from  n  to 
X  for  which  |  Image(/)|  =  k. 
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6.2.9.  This  can  be  done  by  induction  on  the  number  of  edges.  The  starting  situation  involves  some 
number  n  of  vertices  with  no  edges.  Since  the  chromatic  polynomial  is  x",  the  result  is  proved  for 
the  starting  condition. 

Now  for  the  induction.  Deletion  does  not  change  the  number  of  vertices,  but  reduces  the  number 
of  edges.  By  induction,  it  gives  a  polynomial  for  which  the  coefficient  of  x'^  is  a  nonnegative  multiple 
of  (—1)""'^.  Contraction  decreases  both  the  number  of  vertices  and  the  number  of  edges  by  1  and  so 
gives  a  polynomial  for  which  the  coefficient  of  x''  is  a  nonnegative  multiple  of  (— l)""^"*^.  Subtracting 
the  two  polynomials  gives  one  where  the  coefficient  of  x'^  is  an  nonnegative  multiple  of  (—1)""'^. 


Section  6.3 

6.3.1.  Every  face  must  contain  at  least  four  edges  and  each  side  of  an  edge  contributes  to  a  face. 
Thus  4/  >  (edge  sides)  =  2e.  From  Euler's  relation, 

2  =  v-e  +  f  >  v-e  +  e/2  =  {2v-e)/2 

and  so  e  >  2v  —  4. 

6.3.2.  One  can  see  that  it  contains  no  cycle  of  length  3.  (Either  study  it  or  note  that  it  is  bipartite.) 
By  the  previous  exercise,  we  must  have  e  <  2t;  —  4  if  it  is  planar;  however,  v  =  6  and  e  =  9. 

6.3.3  (a)   We  have  2e  =  fdf  and  2e  =  vdy.  Use  this  to  eliminate  v  and  /  in  Euler's  relation. 

(b)  They  are  cycles. 

(c)  If  d/  >  4  and  >  4,  we  would  have  0  <  2/df  +  2/dv  —  1  <  0,  a  contradiction.  Thus  at  least 
one  of  dy  and  df  is  3.  Since  dy  >  3,  we  have  2/dv  <  2/3.  Thus 

0        ^       ^      1  <    ^  ^ 

df     dy        ~  df  3 

and  so  dj  <  2/(1/3)  =  6.  Since  df  is  an  integer,  df  <  5.  Since  df  >  3  for  a  simple  graph, 

interchanging  /  and  v  in  the  above  gives  us  dy  <  5. 

(d)  Altogether  there  are  5  possibilities  for  the  pair  {dy,df)  by  the  previous  part  of  the  exercise. 
Given  df  and  dy,  we  can  solve  (6.9)  for  e.  Then  vdy  =  2e  and  fdf  —  2e  give  v  and  /.  The  five 
graphs  turn  out  to  be  the  Platonic  solids  with  the  interiors  removed.  (They  are  the  tetrahedron, 
cube,  octahedron,  dodecahedron  and  icosohedron.) 

6.3.4.  This  pattern  actually  appears  on  a  soccer  ball,  so  one  could  simply  get  a  soccer  ball  and 
count.  We'll  use  Euler's  relation. 

Since  each  carbon  atom  is  joined  to  3  others  it  lies  on  3  edges,  there  are  3v  =  180  ends  of 
edges.  Since  each  edge  has  2  ends,  2e  =  180  and  so  e  =  90.  Suppose  there  are  /s  pentagons  and 
fe  hexagons.  Since  each  edge  appears  on  two  faces,  5/5  +  6/5  =  2e  =  180.  By  Euler's  relation, 
/s  +  /e  =  e  —  u  +  2  =  32.  Solving  the  pair  of  equations 

5/5  +  6/6  =  180    and    /s  + /e  =  32, 

we  obtain  12  pentagons  and  20  hexagons. 

6.3.5.  The  value  of  c  is  zero.  Suppose  when  we  cut  as  directed  we  cut  through  k  edges.  Each  of 
these  edges  now  becomes  two,  giving  us  k  new  edges.  The  same  happens  with  the  k  faces.  On  each 
of  the  circles  that  we  fill  in  with,  we  also  get  k  edges  and  k  vertices.  The  two  circles  give  us  2  new 
faces.  In  summary,  if  we  originally  had  \V\  vertices,  \E\  edges  and  /  faces  on  the  torus,  we  now  have 
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a  graph  embedded  on  the  sphere  with  V\  +  2k  vertices,  \E\  +  k  +  2k  edges,  and  /  +  fc  +  2  faces.  Prom 
Euler's  relation  on  the  sphere, 

2  =  {\V\+2k)-{\E\  +  ^k)  +  {f  +  k  +  2)  =  \V\-\E\  +  f. 

Thus  |F|  -\E\  +  f  =  0. 

There's  a  subtle  issue  here:  We  described  the  cut  as  if  each  edge  and  face  it  encountered  was 
different.  This  may  not  be  the  case,  an  edge  (and  face)  can  twist  around  the  torus  so  that  the  cut 
meets  it  more  than  once;  however,  the  counts  are  still  correct.  One  way  to  see  this  is  to  imagine 
what  happens  if  we  cut  around  the  face  and  stretch  it  flat.  Stretching  will  distort  our  "bracelet  cut" 
into  some  sort  of  curve  that  may  cut  through  the  face  several  times.  Every  time  it  passes  through 
the  face  it  creates  another  face,  two  edges  and  two  vertices. 

6.3.6  (b)  abcdace. 

(c)  abcdadc. 

6.3.7.  One  method  is  to  list  all  the  simple  planar  graphs  with  V  =  5  and  find  the  least  colorings 
for  them.  We  use  a  theoretical  argument  instead. 

The  lex  least  proper  coloring  of  kC  V  uses  at  most  the  first  k  colors.  If  it  uses  all  k  colors, 
then  vertex  k  must  be  connected  to  each  of  the  other  vertices  and  the  first  k  —  1  vertices  must  use 
all  of  the  first  A:  —  1  colors. 

Let's  apply  these  observations  with  A;  =  5,  4,  3  and  2  to  a  graph  whose  lex  least  coloring  takes 
5  colors.  With  k  =  5,  we  see  that  vertex  5  is  connected  to  each  of  the  first  4  vertices  and  they  use 
4  different  colors.  Now,  with  fc  =  4,  we  see  that  vertex  4  is  connected  to  each  of  the  first  3  vertices 
and  they  use  3  different  colors.  Doing  the  same  thing  with  k  =  3  and  A;  =  2,  we  finally  see  that  every 
vertex  is  connected  to  every  other;  i.e.,  the  graph  is  K^,  which  is  not  planar. 

6.3.8.  A  solution  is 

E  =  {{1,2},{1,3},{1,6},{2,3},{2,4},{2,5}, 
{2, 6},  {3, 4},  {3, 5},  {3, 6},  {4, 5},  {5, 6}}. 

6.3.9.  The  argument  for  degree  4  is  correct.  For  degree  5,  we  can  assume,  perhaps  after  rotating 
and  or  fiipping  the  graph,  that  yi,...,y5  are  assigned  colors  Ci,  C2,  C3,  C4  and  C2,  respectively. 
Suppose  we  look  at  yi  and  t/3  as  in  the  text.  The  argument  given  there  is  okay  if  we  get  yi  and  ys 
in  separate  components.  If  they  are  in  the  same  component,  we  end  up  switching  colors  C2  and  C4 
in  the  component  of  the  subgraph  colored  by  C2  and  C4  that  contains  V4.  The  colors  of  yi, ...  ,2/4 
are  now  Ci,  C2,  C3  and  C2-  If  JJo  was  not  in  the  same  component  with  2/4,  it  is  colored  C2  and  we  are 
done.  Unfortunately,  if  2/4  and  2/5  are  in  the  same  component,  its  color  is  switched  to  C4.  You  should 
convince  yourself  that  there  is  no  way  to  arrange  things  to  avoid  this  possibility. 

6.3.10.  The  figure  on  the  torus  is  essentially  unique,  but  if  we  try  to  draw  it  in  the  plane,  there  are 
many  possibilities.  Figure  S.6.1  shows  one  possible  picture. 

6.3.14  (a)  Direct  each  edge  {u,  v}  with  \{u)  <  X{v)  as  follows:  If  {u,  v}  =  {s,  t},  the  edge  is  {v,  u); 
otherwise,  it  is  {u,v).  We  must  show  that,  for  all  vertices  x,  y,  there  is  a  directed  path  from 
X  to  y.  It  suffices  to  prove  that  there  is  a  directed  walk  from  x  to  y.  To  do  this,  it  suffices  to 
show  that  for  any  vertex  u  there  is  a  directed  walk  from  s  to  t  that  contains  u,  for  then  we  can 
walk  from  u  to  t,  use  the  edge  {t,  s)  and,  finally,  walk  from  s  to  v. 

We  now  construct  a  directed  walk  from  .s  to  t  through  m.  If  u  =  ,s  or  u  =  t,  it  suffices 
to  replace  u  by  any  other  vertex  of  G.  (Such  a  vertex  exists  because  G  has  at  least  two  edges 
and  so  must  have  more  than  two  vertices.)  To  get  a  walk  from  s  to  u,  induct  on  A;  =  A(w)  —  1 
and  to  get  one  from  u  to  t,  induct  on  A:  =  n  —  X{u).  Definition  6.4(c)  can  be  used  to  start  the 
induction  at  A;  =  1  and  to  carry  out  the  induction  step. 
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Figure  S.6.1  A  seven-color  map  on  the  torus  for  Exercise  6.3.10.  Cut  the  square  along  the  dashed  lines 
and  tape  opposite  sides  together  to  form  a  torus.  (To  do  this,  the  figure  must  be  on  stretchable  material, 
like  a  rubber  sheet.)  Partial  extra  copies  of  the  square  are  shown  above  and  to  the  right  of  the  square  so  you 
can  more  easily  see  where  graph  edges  go. 


(b)  Suppose  that  {u,  v}  is  a  bicomponent  of  G.  Then  no  cycle  contains  {u,  v}  and  so  removing  the 
edge  {u,  v}  disconnects  G;  that  is,  {u,  v}  is  an  isthmus  of  G. 

(c)  Let  u  and  v  be  vertices  of  the  graph  G.  Since  G  is  connected,  there  is  a  path  u  =  ui, . . .  ,Uk  =  v 
in  G.  We  will  induct  on  k.  If  k  =  2,  then  {u,  v}  is  an  edge  and  belongs  to  some  bicomponent 
of  G  and  we  are  done  by  (a).  Otherwise,  let  i  be  as  large  as  possible  so  that  Wj  is  in  the  same 
bicomponent  as  u.  Then  i  >  2,  there  is  a  directed  path  from  u  to  Ui  by  (a)  and  there  is  a 
directed  path  from  Ui  to  Uk  by  the  induction  hypothesis.  This  gives  us  a  directed  walk  from  u 
to  V. 

6.3.15.  We  know  from  the  text  that  a  biconnected  graph  has  an  st-labeling.  If  \V\  =  2,  the  result 

is  trivial.  Suppose  that  we  have  an  st-labeling  and  that  {x,y}  is  an  edge  different  from  {s,t}.  We 
may  assume  that  A(a;)  <  A(y).  By  (iii)  in  the  definition  of  st-labeling,  we  can  find  a  sequence 
y  =  Wi,W2, . . .  =  t  such  that  A(wj)  is  strictly  increasing  and  such  that  {wi,  Wj+i}  €  E.  Similarly,  we 
can  find  x  ^  ui,U2,  ■  ■  ■  ~  s.  These  two  paths  with  {s,t}  and  {x,y}  form  a  cycle  of  G  and  so  {x,y} 
and  {s,  t}  are  in  the  same  bicomponent. 


6.4.1  (a)   The  value  of  a  maximum  flow  is  45.  Every  maximum  flow  /  will  have  f{q,  /)  =  10.  Some 
other  values  of  /  are  also  determined  uniquely,  but  many  are  not;  for  example,  the  flow  into 

r  can  have  any  vahu^  from  15  to  20.  Of  c;ourse,  the  flows  on  the  minimum  cut  set  are  unique. 
There  are  four  minimum  cut  sets.  The  one  found  using  A{f)  is 
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{{r,  h},  {/,  a},  {k,  e},  {y,  u},  {z,  u}}. 


The  others  are  obtained 

(i)  by  deleting  {r,  h}  and  adding  {h,  a}  and  {h,  c}, 

(ii)  by  deleting  {y,  u}  and  {z,  u}  and  adding  {u,  n},  or 

(iii)  by  doing  both  (i)  and  (ii). 
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(b)  See  the  previous  solution. 

(c)  The  value  of  a  maximum  flow  is  25.  Every  maximum  flow  /  will  have  f{v,  q)  =  10.  Some  other 
values  of  /  are  also  determined  uniquely,  but  many  are  not.  There  is  just  one  minimum  cut 
set: 

{{c,  d},  {k,  e},  {r,  x},  {w,  x}}. 

(d)  See  the  previous  solution.  Since  we  do  not  have  tools  for  finding  all  minimum  cut  sets,  you 
may  not  have  been  able  to  prove  that  the  minimum  cut  set  was  unique. 

6.4.2.  Let  {a,b)  G  FROM{A,B)  with  f{a,b)  <  c{a,b).  By  the  definition  of  A,  there  is  an  aug- 

mcntablc  path  from  some  D  G  Pin  to  a.  Let  S  be  its  increment.  Since  b  ^  A,  b  is  not  on  this  path.  By 
appending  b  to  this  path,  we  obtain  an  augmentable  path  with  increment  min((5,  c(a,  6)  —  /(a,  6))  >  0, 
implying  that  b  G  A,  a.  contradiction. 

Now  suppose  that  (&,  a)  G  FR0M(i3,  A)  and  ,f{b,a)  >  0.  As  before,  there  is  an  augmentable 
path  with  end  a  and  increment  S  to  which  we  can  append  b  to  obtain  an  augmentable  path  with 
increment  mm{6,  f{b,a)). 

6.4.3.  Since  no  complete  augmentable  path  exists,  I>in     AC  V  —  Vont-  Since  b{v)  =  0  for  w  ^  D,  it 

follows  that  J2veA  ^i"^)  ^  '^vev-  ^(^)'  which  is  the  definition  of  the  value  of  a  flow.  Recall  that  b{v) 
is  the  sum  of  all  flows  out  of  v  minus  the  sum  of  all  flows  into  v.  It  follows  that  for  e  =  {x,  y)  G  E, 
b{x)  has  a  contribution  of  f{x,y)  and  b{y)  has  a  contribution  of  —/(e).  We  distinguish  four  cases 
according  as  x  and  y  are  in  ^  or  i?  and  ask  what  /(e)  contributes  to  '^y^A^i'^)- 

(i)  x  E  B,  y  E  B:  Then  /(e)  contributes  nothing  to  the  sum. 

(ii)  X  £  A,  y  E  A:  Then  /(e)  contributes  both  /(e)  and  —/(e),  which  gives  a  net  contribution 
of  zero. 

(iii)  X  €  A,  y  G  B;  i.e.,  (e)  G  FROM(A,S):  Then  /(e)  contributes  /(e)  to  the  sum. 

(iv)  X  €  B,  y  G  A;  i.e.,  (e)  G  FROM(B,  A):  Then  /(e)  contributes  -/(e)  to  the  sum. 

6.4.4.  A  maximum  flow  will  have  5  liters/sec  flowing  into  and  2  liters/sec  flowing  into  D4  from 
P2.  When  the  pumps  worked  better,  we  could  put  a  greater  demand  on  Pi  and  so  get  8  liters/sec 
flowing  into  D^. 

6.4.5  (a)  Without  examining  the  network  in  detail,  we  would  need  to  let  c[  and     (resp.  C3  and  C4) 

be  the  sum  of  the  capacities  of  edges  leaving  (resp.  entering)  the  corresponding  P/.  That  way 
we  can  guarantee  the  capability  of  supplying  (resp.  removing)  as  much  fluid  as  the  pump  could 
possibly  send  out  to  (resp.  get  in  from)  other  other  sources.  If  we  know  all  the  maximum  flows 
for  the  original  network,  we  may  be  able  to  improve  on  this:  We  need  to  set  to  the  largest 
net  flow  out  of  (resp.  into)  Di  for  all  maximum  flows  in  the  original  network.  This  leads  to  no 
improvement  in  this  case. 

(b)  Yes.  Let  /'  be  a  flow  in  the  new  network  shown  for  the  exercise.  With  the  edges  removed 
and  the  P/  pumps  converted  back  to  depots.  If  we  eliminate  these  edges  from  /'  we  obtain  a 
flow  /  in  the  network  of  Figure  6.6.  We'll  have  value(/)  =  value(/')  because  the  sum  of  the  net 
flows  out  of  Di  and  D2  for  /  equals  the  net  flow  out  of  Do  for  /'  because  b{P{)  =  b{P2)  =  0 
for  /'. 

6.4.6.  Let  {A,  B)  be  a  cut  partition  and  consider  any  directed  path  P  from  a  source  to  a  sink.  The 
path  starts  in  A  and  end  in  B.  If  x  is  the  last  vertex  of  P  that  is  in  A  and  y  is  the  next  vertex  of  P, 
then  [x,  y)  E  FROM(A,  B).  Since  any  path  P  from  a  source  to  a  sink  has  an  edge  in  FR0M(j4,  B), 
this  is  a  cut  set. 

Conversely,  suppose  that  P  is  a  cut  set.  Let  A  consist  of  ©in  and  all  vertices  that  can  be 
reached  from  I?in  along  a  directed  path  containing  no  edges  of  F.  Since  P  is  a  cut  set,  A  contains 
no  sinks  and  so  {A,  B)  with  P  =     —  ^4  is  a  cut  partition.  Suppose  e  =  (a;,  y)  G  FROM(A,  B).  Since 
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X  G  A,  there  is  a  path  P  from  a  source  to  x  that  contains  no  edge  of  F.  Append  e  to  this  path.  Since 

y  ^  A,  the  new  path  contains  an  edge  in  F.  Thus  e  E  F  and  so  FROM(^,  B)  C  F.  (Note  that  we 
generally  do  not  have  equality  because  F  may  be  "too  large;"  e.g.,  we  could  have  F  =  E.) 

6.4.7.  Let  /  and  g  be  two  maximum  flows  and  let  A  =  A{f).  By  the  proof  of  the  Augmentable 
Path  Theorem,  we  see  that  value(g')  =  value(/)  if  and  only  if  g{e)  =  c(e)  for  all  e  e  FROM(A,  B) 
and  g{e)  =  0  for  all  e  G  FROM(S,^).  It  is  tempting  to  conclude  that  therefore  A  =  A{g),  but  this 
does  not  follow  immediately. 

Here  is  a  correct  proof.  As  above,  let  A  =  A(f).  If  A^  A(g),  we  can  assume  that  there  is  some 
V  G  A{g)  with  V  ^  A.  (If  not,  interchange  the  names  of  /  and  g.)  Let  ui,  U2, ...  be  an  augmentable 
path  for  g  that  ends  at  v.  Let  5  be  its  increment.  Since  v  ^  A,  and  u\  €  Vi^  C  A,  there  is  an  i 
with  Ui  €  A  and  Wj+i  e  B.  If  e  =  (wj,Wi+i)  is  the  directed  edge  of  G,  then  51(e)  <  c(e)  —  5  and 
e  e  FROM(A,B).  If  e  =  (?i,+i,?iO  is  the  directed  edge  of  G,  then  g{e)  >  &  and  e  e  FROM(B,  A). 
In  either  case,  the  idea  in  the  previous  paragraph  proves  that  vahxe{g)  <  value(/),  contradicting  the 
assumption  that  g  is  &  maximum  flow. 

6.4.8  (a)  Let  ai, . . . ,  a„  be  a  system  of  distinct  representatives.  Since  they  are  distinct,  they  must 
be  a  permutation  of  n.  Since  S  Ai,  it  is  different  from  all  other  entries  in  the  ith  column 
of  L. 

(b)  We  have  s  &  Ai  \i  and  only  if  s  does  not  appear  in  the  ith  column  of  L.  Since  s  appears  once 
in  each  row,  it  appears  exactly  r  times  in  L.  Since  each  appearance  is  in  a  different  column,  it 
appears  in  r  columns  and  so  does  not  appear  in  exactly  r  of  the  Aj's. 

(c)  (Since  the  ith  column  contains  r  distinct  integers  \Ai\  =  n—r.)  We  use  the  Philip  Hall  Theorem. 
Suppose  that  I  C  n.  Form  a  list  (ordered  or  not)  by  including  the  elements  of  each  Ai  for  i  G  L 
Since  \Ai\  =  n  —  r,  the  list  has  {n—r)\I\  elements,  not  all  of  By  the  previous  part,  each  element 

appears  on  the  list  exactly  n  —  r  times.  Thus  the  list  contains  (n  —  r)\I\/{n  —  r)  =  |/|  distinct 
elements.  (When  n  =  r,  this  breaks  down  because  we  have  0/0.) 

(d)  Suppose  L  is  an  r  X  n  Latin  rectangle.  It  can  trivially  be  completed  when  n  —  r  =  0,  since  it  is 
already  a  square.  Suppose  that  r  <  n.  By  the  previous  part  of  the  exercise,  we  can  add  a  row 
to  L  to  obtain  an  (r  + 1)  x  n  Latin  rectangle  L'.  Since  n  —  (r  + 1)  <  n  —  r,  L'  can  be  completed 

by  induction. 

6.4.9  (c)  By  induction,  there  is  an  SDR  for  the  Ai,  i  &  I.  U  the  claimed  inequality  is  true,  then 
there  is  also  an  SDR  for  the  Bi,  i  G  n  —  X.  Taken  together,  these  give  us  our  representatives. 
It  remains  to  prove  the  inequality.  We  have 

\J   A=  ([jB^UX, 
ieiuR  \eR  ' 

where  the  last  union  is  disjoint.  Thus 


u 

-  \X\  >  \R[JI\  -  \I\  =  \R 

iGRUl 

6.4.10.  Following  the  hint,  we  make  a  network  with  one  source  u  and  one  sink  v.  Let  each  edge  have 
capacity  1.  Given  a  set  of  directed  edge  disjoint  paths,  define  a  flow  /  by  setting  /(e)  =  1  if  e  belongs 
to  any  of  the  paths  in  the  set  and  /(e)  =  0  otherwise.  You  should  be  able  to  see  that  this  is  indeed 
a  flow  and  that  value(/)  is  the  number  of  paths.  Thus  the  maximum  number  of  edge  disjoint  paths 
does  not  exceed  the  value  of  the  network's  maximum  flow.  We  will  show  how  to  associate  va.lue{g) 
edge  disjoint  paths  an  integer  valued  flow  g.  The  Integer  Flow  and  Max-Flow  Min-Cut  Theorems 
will  complete  the  proof. 
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We'll  show  how  to  build  up  a  set  of  edge  disjoint  paths  from  g.  Choose  an  edge  ei  =  (u,  ui) 

with  Ml  €  I?in  and  g{ei)  =  1.  Since  g  is  a  flow,  cither  iti  G  I?  or  there  is  an  edge  62  —  (ui,M2) 
with  5(62)  =  1.  Proceeding  in  this  manner,  we  obtain  a  directed  path  ei, . . .  ,efe  =  {uk-i,Uk)  with 
Wfc  e  2?.  Set  5f'(ej)  =  0  for  1  <  z  <  A;  and  g'{e)  =  g{e)  otherwise  to  obtain  a  new  flow.  If  Uk  =  u, 
vahic(g')  =  vahie(g).  If  Uk  =  v,  valuc(g')  =  value(g)  —  1  and  we  add  the  path  to  our  set  of  paths. 
Iterating  this  process,  we  eventually  reach  a  flow  of  with  value  zero  and  a  set  of  value(5)  edge  disjoint 
paths  from  u  to  v. 

6.4.11.  The  result  in  the  previous  exercise  is  valid  when  all  edges  are  taken  to  be  undirected.  To 
see  this,  construct  a  directed  graph  by  replacing  eac;h  edge  {x,  y}  of  G  with  the  two  edges  (.x,  y)  and 
{y,  x).  The  flrst  part  of  the  previous  proof  goes  through.  If  a  directed  path  ei,  62, . . .  is  constructed 
from  a  flow,  replace  each  edge  {x,y)  in  the  directed  path  with  {x,y}.  This  gives  what  we  will  call 
a  pseudo-path.  The  same  edge  may  appear  twice  in  the  pseudo-path  because  there  may  be  two 
directed  edges  e,  =  (x,  y)  and  ej  =  {y,  x)  which  give  the  same  undirected  edge.  We  may  assume  that 

i  <  j.  Replace  the  pseudo-path  with  the  pseudo-path  obtained  from  ei, . . . ,  Cj-i,  e^+i,  Iterating 

this  process  eventually  leads  to  a  path  from  u  to  v.  (You  may  want  to  fill  in  some  details  about 
that.) 

6.4.12.  The  problem  pretty  well  states  what  is  to  be  proved.  For  the  undirected  case,  first  convert 
it  to  a  directed  graph  as  done  in  the  previous  exercise.  For  the  directed  case,  split  each  vertex  x 
as  done  in  Exercise  6.4.4  to  obtain  a  pair  of  vertices  and  an  edge  {x',x")  connecting  them.  Let  the 
source  be  u"  and  the  sink  v' .  When  a  directed  path  is  obtained,  reverse  the  steps.  Note  that  each 
vertex  can  appear  in  at  most  one  directed  path  because  c{x',x")  =  1  in  the  altered  graph. 
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6.5.1  (a)  The  probability  that  a  vertex  v  has  degree  d  is  — p)"-i-<i  since  we  must  choose 
d  of  the  remaining  n  —  1  vertices  to  connect  to  v,  then  multiply  by  the  probabihty  of  an  edge 
being  present  {p)  or  absent  {1—p).  Probabilities  multiply  since  edges  are  independent  in  Gp{n). 
Using  linarity  of  expectation  and  summing  over  all  n  vertices,  we  get  n("^^)p''(l  —  p)""^"**. 

(b)  If  C  is  a  potential  4-cycle  of  4  vertices,  let  Xc  =  1  if  the  cycle  is  present  and  Xc  =  0  if  it  is  not. 
Then  E(Xc)  =  p"^.  We  must  multiply  this  by  the  number  of  choices  for  C;  that  is,  the  number 
of  potential  4-cycles.  This  number  is  (4)  x  3  =  "^"~^^i"x2^^^"~^\  which  can  be  derived  in  at 
least  two  ways: 

•  Note  that  there  are  3  ways  to  make  a  4-cycle  out  of  a  set  of  4  vertices. 

•  Choose  an  ordered  list  of  4  vertices  that  represent  walking  around  a  cycle.  There  are  4 
vertices  that  could  have  been  chosen  as  the  starting  vertex  and  2  ways  we  could  have 
gone  around  the  cycle. 

(c)  This  is  the  same  as  the  previous  situation,  except  that  now  we  must  make  sure  the  two  edges 
that  cut  across  the  4-cycle  are  not  present.  Hence  the  answer  is  3(2)p'^{^  —  p)^- 

6.5.2  (a)  For  each  injection  ip  :  Vh  ^  n,  let  X^  =  1  if  the  injection  is  an  embedding  of  H  into  the 
random  graph  and  let  X^  =  0  otherwise.  Then  E(X(^)  =  pl^^L  The  number  of  choices  for  ip 

n! 

is  n{n  —  1)  •  •  •  (n  —  \Vh\  +  1) 


{n-\VH\y.- 


(b)    The  number  of  possible  edges  in  a  set  of  \Vh\  vertices  is  ('  ^  )■  Call  this  number  H2-  The 

answer  is   p^^"Hl  -  p)"^~^^"^ . 

{n-\VH\y. 
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(c)  Suppose  Vh  =  {a,  b,  c}  and  that  {p,  q,  r}  are  the  vertices  of  a  triangle  in  a  random  graph.  The 
triangle  formed  by  {p,  q,  r}  in  counted  once  in  Example  6.13.  In  part  (a)  it  is  counted  6  times 

because  there  are  6  injections  (p  :  {a,  6,  c}      {p,  q,  r}. 

6.5.3  (a)  The  probability  of  a  cycle  is  the  probability  of  the  union  of  the  sets  Qc-  The  probability 
of  the  union  of  sets,  is  less  than  or  equal  to  the  sum  of  their  separate  probabilities;  that  is, 
I>v{AU  BU---)  <  Pr(A)+Pr(B)H  . 

(b)  The  denominator  is  \Q{n,k)\.  The  numerator  counts  graphs  as  follows.  There  are  (c  —  1)! 
directed  cycles.  Since  each  cycle  can  be  made  directed  in  two  ways,  there  are  (c—  l)!/2  cycles. 
Since  we  have  used  up  c  edges  making  the  cycle,  we  must  choose  k  —  c  edges  from  the  remaining 

N  ~  c  unused  edges. 

(c)  Collect  terms  in  (a)  according  to  c  =  |C|  and  use  (b).  There  are  (")  c-subsets  of  n. 

(d)  The  left  side  comes  from  writing  (^^J  =  gnd  doing  some  algebra.  The  inequality 

comes  from  77-^^7  <  fc^  and         <  -  when  y  >  x  >  j. 

6.5.4  (a)  We  need  k  edges.  Since  each  has  probability  p  and  they  are  independent  in  Qp{n),  the 
answer  is  p'^. 

(b)  There  are  less  than  n'' /k  possible  cycles.  By  (a),  each  has  probability  less  than  p''. 

(c)  By  (b) ,  the  probability  of  a  cycle  is  less  than 

3(1  -  jm) 


^{pnf/k  <  _^(pn)73  =  ;j7^^^^  <  {pnY 

fc>3  fe>3 


6.5.5  (a)   Let  T  contain  a  close  to  half  the  vertices  as  possible.  If  \V\  =  2n,  \T\  =  n  and     — Tj  =  n. 
Since  G  contains  all  edges,  this  choice  of  T  gives  us  a  bipartite  subgraph  with      edges.  When 
=  2n+l,  we  take  |T|  =  n  and     — T|  =  n  +  1,  obtaining  a  bipartite  subgraph  with  n(n+l) 
edges. 

(b)  The  example  bound  is  \E\/2  and  =  \V2iV)\  =  \V\{\V\  -  l)/2.  For  \V\  =  2n,  we  have 
\E\/2  =  n{2n  —  l)/2  =  —  n/2.  Hence  the  bound  is  off  by  n/2.  This  may  sound  large,  but 
the  relative  error  is  small:  Since  (n^  —  n/2)/n^  =  1  —  l/2n,  the  relative  error  is  We  omit 
similar  calculations  for  |y|  =  2n  +  1. 

(c)  The  idea  is  to  construct  the  largest  possible  complete  graph  and  then  add  edges  in  any  manner 
whatsoever.  Let  m  be  the  largest  integer  such  that  k  >  ('^),  choose  S  C  V  with  \S\  =  m, 
construct  a  complete  graph  on  m  vertices  using  (™)  edges,  and  insert  the  remaining  k  —  (™) 
edges  in  any  manner  to  form  a  simple  graph  G{V,  E).  By  (a),  the  number  of  edges  in  a  bipartite 
subgraph  of  the  complete  graph  on  T  has  at  least  (m/2)^— m  edges  for  some  constant  C  Since  m 
is  as  large  as  possible,  k  <  ("+^)  <  Thus  m  +  1  >  V2k.  Also,  since  k  >  {"^)  >  ^^"V^' 
m  —  1  <  \/2fc.  Hence  the  number  of  edges  in  bipartite  subgraph  is  at  least 

(y2k  -  ly 


{m/2Y  -m  >  ^  ^  V2fc-1, 

Which  equals  k  minus  terms  involving  k^/^  and  constants. 

(d)   Call  the  colors  1,2,3.  Let  Vi  be  the  set  of  vertices  colored  with  color  i  and  let  Eij  be  the  set 

of  edges  in  G  that  connect  vertices  in  Vi  to  vertices  in  Vj.  Since  \E\  —  |i?o.i|  +  l^o.2|  +  |-E'i,2|5 
at  least  one  of  \Eij\  is  at  most  li^l/S.  Suppose  it  is  £'1,2-  The  bipartite  subgraph  whose  edges 
connect  vertices  in  Vq  to  vertices  in  Vi  U  V2  contains  E  —  \Ei^2\  >  2|iJ|/3  edges. 
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Section  6.6 

6.6.1.  The  left  column  gives  the  input  and  the  top  row  the  states. 


0 

1 

2 

3 

4 

0 

0 

2 

4 

1 

3 

1—1 

1 

3 

0 

2 

4 

6.6.2  (a)  and  (b)  The  digraph  can  be  drawn  from  the  transition  table.  There  will  be  3  states, 
0,1,2,  corresponding  to  the  remainder  after  dividing  the  number  by  3.  Thus  the  starting  and 
accepting  state  are  both  0.  Here's  the  transition  table. 


0 

1 

0 

0 

1 

1 

2 

0 

2 

1 

2 

(c)    Here's  the  transition  table.  The  starting  and  accepting  states  are  both  0. 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0 

0 

1 

2 

0 

1 

2 

0 

1 

2 

0 

1 

2 

0 

1 

2 

0 

1 

2 

0 

1 

2 

2 

1 

2 

0 

1 

2 

0 

1 

2 

0 

1 

(d)  What  follows  is  probably  the  easiest  and  most  natural  way  to  think  about  the  problem,  but 
there  is  a  more  general  idea  that  is  found  in  the  next  part.  Since  10  has  a  remainder  of  1  when 
divided  by  3,  the  remainder  of  flfelO'^  when  divided  by  3  is  the  same  as  the  remainder  of  a^- 
Thus  processing  a  number  from  left  to  right  will  be  the  same  as  from  right  to  left.  Consequently 
the  previous  transition  table  works. 

(e)  If  we  attempt  to  use  the  idea  of  the  previous  part,  we  see  that  the  remainder  of  dividing  2'^  by 
3  depends  on  whether  k  is  odd  or  even.  This  suggests  that  our  state  consist  of  two  parts,  one 
for  the  parity  of  k  and  one  for  the  remainder  so  far.  There  is  another  way  to  think  about  this 
that  works  in  general.  Suppose  that  we  are  working  in  base  b  and  looking  at  divisibility  by  d. 
Also  suppose  that  B  is  such  that  bB  has  remainder  1  when  divided  by  d.  Note  that  a  number 
N  is  divisible  by  d  if  and  only  if  NB"-  is  divisible  by  d  and  that 

B"(a„6"  +  ---  +  ai6^+ao&°)  =  aoB"  +  aiB"-^(6B)  +  •  •  •  +  a„B°(6B)". 

Using  the  fact  the  (bB)''  has  remainder  1  when  divided  by  d,  This  number  has  the  same 
remainder  as  UqB"  +  •  •  •  +  anB^ .  This  means  that  our  original  number  is  divisible  by  d  if  and 
only  if  the  number  we  get  by  switching  right  and  left  and  changing  to  base  B  is  divisible  by  d. 
With  6  =  10  and  d  =  3,  we  can  take  B  =  IQ  since  dividing  bB  =  100  by  3  gives  remainder  1. 
Hence  the  solution  for  left  to  right  works  for  right  to  left.  As  a  further  example,  suppose  the 
base  is  still  10  but  now  d  =  7.  Since  50  has  a  remainder  of  1,  we  can  set  B  =  5. 

6.6.3.  The  states  are  0,  01,  El  and  R.  In  state  0,  a  zero  has  just  been  seen;  in  01,  an  odd  number 
of  ones;  in  El,  an  even  number.  The  start  state  is  0  and  the  accepting  states  are  0  and  01.  The  state 
R  is  entered  when  we  are  in  El  and  see  a  0.  Thereafter,  R  always  steps  to  R  regardless  of  input. 
You  should  be  able  to  finish  the  machine. 
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<5 

comments  on  state 

b 

si  sd 

z 

la 

starting 

si 

z 

sd 

z 

la 

part  1  sign  seen 

la 

z 

2 

e 

la 

part  1  digits  seen;  accepting 

sd 

z 

z 

z 

lb 

decimal  seen,  no  digits  yet 

lb 

z 

z 

e 

lb 

part  1  after  decimal;  accepting 

e 

s2 

z 

z 

2 

E  seen 

s2 

z 

z 

z 

2 

part  2  sign  seen 

2 

z 

z 

z 

2 

part  2  digits  seen;  accepting 

z 

z 

z 

z 

z 

error  seen 

Figure  S.6.2  The  transition  table  for  a  finite  automaton  that  recognizes  fioating  point  numbers,  the 
possible  inputs  are  sign  (cr),  decimal  point  (•),  digit  (S)  and  exponent  symbol  (E).  The  comments  explain 
the  states. 


6.6.4.  If  you  understand  what  this  automaton  is  recognizing  and  the  significance  of  the  states,  it 
makes  the  problem  easier.  It  is  looking  for  strings  of  digits  which  may  have  a  sign  to  begin  with. 
The  state  b  corresponds  to  having  seen  nothing,  s  to  having  seen  a  sign,  d  a  digit  and  z  something 
illegal. 

(b)  It  recognizes  all  nonempty  strings  of  digits  with  an  optional  sign  at  the  start.  Thus  it  would 
not  recognize  the  string  "+". 

(c)  It  recognizes  all  nonempty  strings  that  consist  of  an  optional  sign  followed  by  digits.  Thus  it 
would  recognize  the  string  "+". 

6.6.5.  In  our  input,  wc  let  5  stand  for  any  digit,  since  the  transition  is  independent  of  which  digit 
it  is.  Similarly,  a  stands  for  any  sign.  There  is  a  bit  of  ambiguity  as  to  whether  the  integer  after 
the  E  must  have  a  sign.  We  assume  not.  The  automaton  contains  three  states  that  can  transit  to 
themselves:  recognizing  digits  before  a  decimal,  recognizing  digits  after  a  decimal  and  recognizing 
digits  after  the  E.  We  call  them  la,  lb  and  2.  There  is  a  bit  of  complication  because  of  the  need  to 
assure  digits  in  the  first  part  and,  if  it  is  present,  in  the  second  part.  The  transition  table  is  given  in 
Figure  S.6.2. 

6.6.6  (a)   Our  states  will  be  0,  1,  p,  and  i,  where  p  indicates  that  we  have  just  seen  what  could  be 

an  isolated  1  and  i  indicates  that  wc  have  seen  an  isolated  one.  The  start  state  is  0  and  the 
accepting  states  are  p  and  i.  Here's  the  transition  table. 
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i 

i 

i 

(b)    We  can  join  together  what  look  roughly  like  two  copies  of  the  previous  machine.  The  states 
in  the  second  one  are  postfixed  with  an  r  and  are  used  to  look  for  a  second  isolated  one.  The 


54       Foundations  of  Applied  Combinatorics 


5 

10 

25 

A 

B 

C 

R 

0 

5 

10 

25 

0 

0 

0 

0 

5 

10 

15 

30 

5 

5 

5 

0,  R5 

10 

15 

20 

10,  R25 

10 

10 

10 

0,  RIO 

15 

20 

25 

15,  R25 

0,  AO 

15 

15 

0,  R15 
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0,  BO 

20 

0,  R20 

25 

30 

25,  RIO 

25,  R25 

0,  AlO 
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0,  R25 

30 

30,  R5 

30,  RIO 
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Figure  S.6.3  The  transitions  and  outputs  for  an  automaton  that  behaves  hke  a  vending  machine.  The 
state  is  the  amount  of  mone  held  and  the  input  is  either  money,  a  purchase  choice  (A,  B,  C)  or  a  refund 
request  (R). 


accepting  states  are  p,  Or,  and  Ir. 
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6.6.7  (a)  Wc  need  states  that  keep  track  of  how  much  money  is  held  by  the  machine.  This  leads 
us  to  states  named  0,5,...,  30.  The  output  of  the  machine  will  be  indicated  by  An,  Bn,  Cn 
and  n,  where  n  indicates  the  amount  of  money  returned  and  A,  B  and  C  indicate  the  item 
delivered.  There  may  be  no  output.  The  start  state  is  0. 

(b)   See  Figure  S.6.3. 

6.6.8  (a)  Let 

MxM'  =  {SxS',I,  fx  f,  {So,  s'J,  A  X  A') 
where  (/x/')(s,s',i)  = 

(b)  There  is  an  edge  from  (,s,  ,s')  to  {t,  t')  if  and  only  if  there  is  an  i  S  7  such  that  /(s,  i)  =  t  and 
f'{s',i)  =  t'.  The  edge  is  associated  with  the  input  i. 

(c)  A  number  is  divisible  by  15  if  and  only  if  it  is  divisible  by  5  and  divisible  by  3.  If  the  two  given 
machines  are  called  A4  and  A4',  we  simply  look  at      x  A4'. 

(d)  The  machine  in  the  previous  part  can  be  used;  however,  the  accepting  states  must  be  those  for 
which  either  remainder  is  zero. 


Solutions  Manual  55 


Section  7.1 

7.1.1.  A{m)  (note  m,  not  n)  is  the  statement  of  the  rank  formula.  The  inductive  step  and  use  of 
the  inductive  hypothesis  are  clearly  indicated  in  the  proof. 

7.1.2.  A{n)  is  the  claim  that  Di,  D2, . . . ,  -D„  has  been  chosen  by  the  greedy  algorithm  and  is  part  of 
the  correct  path.  The  inductive  hypothesis  is  used  in  the  assumption  that  Di, . . . ,  Di_i,  is  part 
of  the  correct  path  for  some  D\. 

7.1.3.  Let  A{k)  be  the  assertion  that  the  coefficient  of  y™^  •  •  •  2/™*°  in  (2/1 H  l"yfc)"  is  n!/mi!  •  •  •  rufc! 

if  n  =  nil  +  •  •  •  +  TOfe  and  0  otherwise.  .4(1)  is  trivial.  We  follow  the  hint  for  the  induction  step.  Let 
X  =  yi  +  ■  ■  ■  +  Dk-i-  By  the  binomial  theorem,  the  coefficient  of  x™")/^''  in  [x  +  j/^)"  is  nl/mlmkl  if 
n  =  m  +  mk  and  0  otherwise.  By  the  induction  hypothesis,  the  coefficient  of  ■  ■  •  y^^i^  in  x"^  is 
m\/mi\  ■  ■  ■  mk-i\  if  m  =  mi  +  •  •  •  +  mfc_i  and  zero  otherwise.  Combining  these  results  we  see  that 
the  coefficient  of  j/™'  •  •  •  y'^"  in  (yi  H  h  ykT  is 

n!  m! 
to!  TOfe!  TOi!  •  •  •  TOfe_i! 

if  n  =  TOi  +  •  •  •  +  TOfe  and  0  otherwise. 

7.1.4  (a)    Since  (ii)  starts  at  n  =  2,  the  case  Di  —  IDq  +  (—1)^  must  be  proved  directly.  That's 
easy.  For  n  >  2  we  have,  with  a  bit  of  trickiness, 

Dn  =  nDr,-i  +  (-1)"  =  (n  -  +  (-1)"  +  £>n-i  by  (i)  at  n 

=  (n  -  +  (-1)"  +  (n  -  1)D„_2  +  (-1)""'       by  (i)  at  n  -  1 

=  (n-l)(£>„_i  +  £>„_2). 

(b)  Let  A{n)  be  the  claim  that  £>„  =  nD„_i  +  (—1)".  -4(1)  is  easily  checked.  Now  for  the  inductive 
step. 

Dn  =  {n-  l)(£>„_i  +  Dn-2)  =  nDn-i  +  (n  -  1)£>„_2  -  £>„-i 

=  n£>„_i  +  (n  -  1)£>„_2  -  ((n  -  1)£»„_2  +  (-1)""')  by  A{n  -  1) 

=  nDn-i  +  (-1)". 

(c)  Let  A{n)  be  the  desired  equation.  It  is  easy  to  verify  ^(0).  Now  for  the  induction  step  when 
n  >  1.  We  have 

D„  =  ni?„_i  +  (-l)"  =  n(n-l)!^i^+n!i^  =  nl^]^, 

fe=o       ■  ■  fe=0 

where  the  second  equality  used  A{n  —  1). 

(d)  Using  (iii)  twice  we  have 

/    -ixfc  ^ —  ^  {  '\\k 

Dn  =  n\Y,^^  =  n\Y.^-=^  +  {-ir  =  nD„_i  +  (-l)". 
fc=o      ■  fe=0 

7.1.5(a)   x[x2  +  x[x2  =  x[. 

(b)  x[x2+XiX2. 

(c)  x[x'2X3  +  X[X2X3  +  XiX2X'^  +  XiX2X'^    =   x'lXz  +  Xix'^. 

(d)  x[X2X3  +  x\X2X'^  +  x\X2X3  +  XiX^x'^    =   x'-^X2  +  X^X^  +  XiXjXg. 
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7.1.6.  This  can  be  done  in  various  ways.  One  possible  approach  is  to  use  the  distributive  law  to 
expand  the  products.  Another  is  to  use  the  method  that  was  used  in  proving  the  theorem. 

(a)    By  the  distributive  law:  X1X2  +  X1X4  +  XsX4. 
By  the  proof  of  the  theorem; 

(xi  +  a;3)(a;2  +  2:4)  =  (xi  +  a;3)(x2  +  0)x4  +  (xi  +  a;3)(x2  +  l)x4 


There  are  a  variety  of  forms  with  more  terms. 

7.1.7.  If  you  are  familiar  with  de  Morgan's  laws  for  complementation,  you  can  ignore  the  hint  and 
give  a  simple  proof  as  follows.  By  Example  7.3,  one  can  express  /'  in  disjunctive  form:  /'  =  Mi  + 
M2  +  -  •  •.  Now  /  =  (/')'  =  M(M^  •••  by  de  Morgan's  law  and,  if  Mj  =  1/12/2  then  M^'  =  y'i+y2  +  --- 
by  de  Morgan's  law. 

To  follow  the  hint,  replace  (7.5)  with 


and  practically  copy  the  proof  in  Example  7.3. 

7.1.9.  Wc  can  induct  on  cither  k  or  n.  It  doesn't  matter  which  wc  choose  since  the  formula  we  have 
to  prove  is  symmetric  in  n  and  k.  We'll  induct  on  n.  The  given  formula  is  A{n).  For  n  =  0,  the 
formula  becomes  F^+i  =  F^+i,  which  is  true. 


(xi  +  a;3)a;2a;4  +  {xi  +  a;3)a;2a;4 

(a;i  +  0)x3.-E2.i^4  +  (xi  +  l)x3.T2.x4  +  {xi  +  0)x';^X2X4  +  {xi  +  l)a;3a;2a;4 

Xix'^X2x'^  +  X[X3X2X'^  +  X\x'^x'2X4,  +  x'-^X^x'2X4,. 


(b)    By  the  distributive  law. 


remembering  that  xx  =  x  and  x  +  xy  =  x: 


(a;i  +  a;2a;3)(a;2  +  3:3)  =  a;ia;2  +  a;ia;3  +  a;2a;3. 


f{xi,...,Xn)    =    {gi{xi,...,Xn-l)  +  x'J  {go{xi,...,Xn-l)  +  Xn) 


Fn+k+1    —   F(^n-l)  +  {k+l)  +  l 


using  the  hint 
by  A{n  -  1) 
by  definition  of  Fk+2 
by  rearranging 
by  definition  of  -Fn+i- 


=  FnFk+2  +  Fn-iFk+l 

=  Fn{Fk+l  +  Fk)  +  Fn-iFk+l 

=  {Fn  +  Fn-i)Fk+i  +  FnFk 

=  -Fn+l-Ffe+1  +  FnFk 
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Section  7.2 

7.2.1.  Given  that  p  and  q  are  positive  integers,  it  does  not  follow  that  p'  and  q'  are  positive  integers. 
(For  example,  let  p  =  1.)  Thus  A{n  —  1)  may  not  apply. 

7.2.2.  The  last  sentence  in  the  proof  is  quite  vague:  It  does  not  explain  how  one  is  going  to  actually 
use  the  drawing.  Any  attempt  to  make  the  sentence  more  precise  is  bound  to  fail  because  one  cannot 
carry  out  the  idea  described  there. 

7.2.3.  You  may  object  that  the  induction  has  not  been  clearly  phrased,  but  this  can  be  overcome: 
Let  /  be  the  set  of  interesting  positive  integers  and  let  A{n)  be  the  assertion  n  G  I.  If  A{1)  is  false, 

then  even  1  is  not  interesting,  which  is  interesting.  The  inductive  step  is  as  given  in  the  problem:  If 
A{n)  is  false,  then  since  A{k)  is  true  for  all  A;  <  n,  n  is  the  smallest  uninteresting  number,  which  is 
interesting. 

Then  what  is  wrong?  It  is  unclear  what  "interesting"  means,  so  the  set  of  interesting  positive 
integers  is  not  a  well  defined  concept.  Proofs  based  on  foggy  concepts  arc  always  suspect. 

7.2.4.  There  is  not  reduction  of  the  problem  to  a  simpler  case.  We  could  overcome  this  by  assigning 
numbers  to  the  students  and  making  sure  that  a  person  always  asks  someone  with  a  lower  number, 
but  then  student  number  1  would  have  no  one  in  the  class  to  turn  to. 

Section  7.3 

7.3.1  (a)  We  must  compare  as  long  as  both  lists  have  items  left  in  them.  After  all  items  have  been 
removed  from  one  list,  what  remains  can  simply  be  appended  to  what  has  been  sorted.  All 
items  will  be  removed  from  one  list  the  quickest  if  each  comparison  results  in  removing  an  item 
from  the  shorter  list.  Thus  we  need  at  least  min(fci,  comparisons. 

On  the  other  hand,  suppose  we  have  ki  +  /ca  items  and  the  smallest  ones  are  in  the  shorter 
list.  In  this  case,  all  the  items  are  removed  from  the  shorter  list  and  none  from  the  longer  in 
the  first  min(fci,  ^2)  comparisons,  so  we  have  achieved  the  minimum. 

(b)  Here's  the  code.  Note  that  the  two  lists  have  lengths  m  and  n  —  m  and  that  min(m,  n  —  m)  =  m 
because  m  <  n/2. 

Procedure  c(n) 

c  =  0 

If  (n  =  1) ,  then  Return  c 

Let  m  be  n/2  with  remainder  disceirded 

c  =  c  +  c(m) 
c  =  c  +  c(n  —  m) 
c  =  c  +  m 
Return  c 

End 

(c)  We  have  c(2°)  =  0  and  c(2'=+i)  =  2c{2'')  +  2^  for  A;  >  0.  The  first  few  values  are 

c(2°)  =  0,    c(2^)  =  2°,    0(2^)  =  2  X  2\    c(2^)  =  3  x  2^,    c(2^)  =  4x2^. 

This  may  be  enough  to  suggest  the  pattern  c(2^^  =  k  x  2*^"^;  if  not,  you  can  compute  more 
values  until  the  pattern  becomes  clear. 

We  prove  it  by  induction.  The  conjecture  c(2'')  =  kx  2^~^  is  the  induction  assumption. 
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For  A;  =  0,  we  have  c(2°)  =  0,  and  this  is  what  the  formula  gives.  For  >  0,  we  use  the 
recursion  to  reduce  k  and  then  use  the  induction  assumption: 

c(2'=)  =  2c(2'=-i)  +  2'=-i  =  2  X  (fc  -  1)  X  2'=-^  +  2'=-!  =  A;  x  2*=-\ 

which  completes  the  proof. 
When  k  is  large, 

c(2*=)  _      fcx2*=-i      _  k/2 

C(¥)  ^   (A;-l)2'=  +  l  ~  k-1  +  2-''  ~     '  ■ 

This  shows  that  the  best  case  and  worst  case  differ  by  about  a  factor  of  2,  which  is  not  very 
large. 

7.3.2.  We  have 


D{n,l):  ■■■  D{n,k):  ■  ■  ■ 

1        2      •••      n  fe,  £)(fe-l,fe-l)      k+1,  D{k,k-1)    ■■■      n,  D{n-l,k-l) 

7.3.3.  Here  is  code  for  computing  the  number  of  moves. 

Procedure  M(n) 
M  =  0 

If  (n  =  l),  then  Return  M 

Let  m  be  n/2  with  remainder  discarded 

M  =  M  +  M(m) 

M  =  M  +  Hdn-m) 

M  =  M  +  n 

Return  M 

End 

This  gives  us  the  recursion  M(2'=)  =  2M{2''-^)  +  2*^  for  fc  >  0  and  M{2°)  =  0.  The  first  few 
values  are 

M(2")  =  0,    M{2^)  =  2\    M{2^)  =  2x2^,    M{2^)  =  3  x  2^    M(2*)  =  4x2"*. 

Thus  we  guess  M(2'^)  =  k2'^,  which  can  be  proved  by  induction. 

7.3.4.  Wc  specified  that  Find  was  to  always  report  a  counterfeit  coin,  but  when  wc  used  it  recursively 
we  assumed  that  it  could  also  report  that  there  was  no  counterfeit  coin.  To  allow  for  this,  we  must 
alter  "Else  report  C . "  C  must  be  compared  with  some  other  coin  to  determine  whether  or  not  it 
is  counterfeit.  The  corrected  algorithm  requires  n  —  1  weighings,  which  is  very  poor  since  only  about 
log2  n  weighings  arc  needed. 

7.3.5  (a)  Here's  one  possible  procedure.  Note  that  the  remainder  must  be  printed  out  after  the 
recursive  call  to  get  the  digits  in  the  proper  order.  Also  note  that  one  must  be  careful  about 
zero:  A  string  of  zeroes  should  be  avoided,  but  a  number  which  is  zero  should  be  printed. 
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OUT(m) 

If  m  <  0 ,  then 
Print 

Set  m  =  —m 
End  if 

Let  q  and  0  <  r  <  9  be  determined  by  m  =  lOq  +  r 
If  q>0,  then  OUT(g) 
Print  r 

End 

(b)  Single  digits 

(c)  When  OUT  calls  itself,  it  passes  an  argument  that  is  smaller  in  magnitude  than  the  one  it 
received,  thus  OUT(m)  must  terminate  after  at  most  |m|  calls. 

7.3.6  (a) 

DSUM(n) 

If  n  =  0,  Return  0. 

Let  q  and  0  <  r  <  9  be  determined  by  n  =  lOq  +  r . 
Return  DSUM(g)  +  r . 

End 

(b)  Zero 

(c)  Same  as  previous  exercise. 

7.3.7.  The  description  for  fc  =  1  is  on  the  left  and  that  for  A;  >  1  is  on  the  right: 

1  h 

n-  n- 

1    ■■■    n  k,  fc- 1^     •  •  •     n,  n- 1^ 

7.3.8.  We  will  not  draw  the  trees.  The  moves  for  n  =  2  are  S^E,  S^G  and  E^G.  The 
moves  for  n  =  4  are  (reading  row  by  row) 

S^E         S^G         E^G         S^E         G^S  G^E         S-^E  S^G 

E^G        E^S        G^S         E^G        S^E  S^G  E^G 

7.3.9  (a)  Let  A{n)  be  the  assertion  "'ii{n,  S,  E,G)  takes  the  least  number  of  moves."  Clearly  -4(1) 
is  true  since  only  one  move  is  required.  We  now  prove  A{n).  Note  that  to  do  S — >G  we  must 
first  move  all  the  other  washers  to  pole  E.  They  can  be  stacked  only  one  way  on  pole  E,  so 
moving  the  washers  from  S  to  E  requires  using  a  solution  to  the  Tower  of  Hanoi  problem  for 
n  —  1  washers.  By  A{n  —  1),  this  is  done  in  the  least  number  of  moves  by  H(n  —  1,  S,  G,  E). 
Similarly,  H(n  —  1,E,S,G)  moves  these  washers  to  G  in  the  least  number  of  moves. 

(b)   Simply  replace  H(m, . . .)  with  S{m)  and  replace  a  move  with  a  1  and  adjust  the  code  a  bit  to 
get 
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Procedure  S{n) 

If  (n  =  1)    Return  1. 

M  =  0 

M  =  M  +  S{n-1) 
M  =  M+1 
M  ^  M  +  S{n-  1) 
Return  M 

End 

The  recursion  is  S{1)  —  1  and  S{n)  —  2S{n  —  1)  +  1  when  n  >  1. 

(c)  The  values  are  1,  3,  7,  15,  31,  63,  127. 

(d)  Let  A{n)  be  "5'(n)  =  2"  —  1."  ^(1)  asserts  that  S{1)  =  1,  which  is  true.  By  the  recursion  and 
then  the  induction  hypothesis  we  have 

S{n)  =  2S{n  -  1)  +  1  =  2(2"-^  -  1)  +  1  =  2"  -  1. 

(e)  By  studying  the  binary  form  of  k  and  the  washer  moved  for  small  n  (such  as  n  =  4)  you  could 
discover  the  following  rule. 

If  A;  =  •  •  •  636261  is  the  binary  representation  of  k,  bj  =  1, 
and  bi  =  0  for  all  i  <  j,  then  washer  j  is  moved. 

(This  simply  says  that  bj  is  the  lowest  nonzero  binary  digit.)  No  proof  was  requested,  but  here's 
one.  Let  A{n)  be  the  claim  for  H(n, . . .).  ^(1)  is  trivial.  We  now  prove  A{n).  If  fc  <  2'^~^,  it 
follows  from  S{m)  that  H(n  —  1, . . .)  is  being  called  and  A{n  —  1)  applies.  If  k  ~  2"^^,  then  we 
are  executing  S-^G  and  so  this  case  is  verified.  Finally,  if  2"~^  <  k  <  2",  then  H(n  —  1, . . .) 
is  being  executed  at  step  k  —  2"~^,  which  differs  from  k  only  in  the  loss  of  its  leftmost  binary 
bit. 

(f)  Suppose  that  we  are  looking  at  move  k  =  ■  ■  ■  636261  and  that  washer  j  is  being  moved.  (That 
means  bj  is  the  rightmost  nonzero  bit.)  You  should  be  able  to  see  that  this  is  move  number 
•••6^+26^+1  =  (A;  —  2^~^)/2^  for  the  washer.  Call  this  number  fc'.  To  determine  source  and 
destination,  we  must  study  move  patterns. 

The  pattern  of  moves  for  a  washer  is  either 

Po-.S^G-^E^S^G^E^--  -  repeating  or 
Pi:S^E^G^S^E^G^--  -  repeating. 

Which  washer  uses  which  pattern?  Consider  washer  j  it  is  easily  verified  that  it  is  moved  a 
total  of  2""-'  times,  after  which  time  it  must  be  at  G.  A  washer  following  Pi  is  at  G  only  after 
move  numbers  of  the  form  3t  +  i  +  1  for  some  t.  Thus  i  +  i  is  the  remaiiider  when  2"~^  is 
divided  by  3.  The  remainder  is  1  if  n  — j  is  even  and  0  otherwise.  Thus  washer  j  follows  pattern 
Pi  where  i  and  n  —  j  have  the  same  parity.  If  we  look  at  the  remainder  after  dividing  k'  by  3, 
we  c;an  see  what  the  source  and  destination  are  by  looking  at  the  start  of  Pi.  For  those  of  you 
familiar  with  congruences,  the  remainder  is  congruent  to  {—lyk  +  1  modulo  3. 

7.3.10.  We  are  not  actually  reducing  to  a  simpler  problem  because  we  cannot  ignore  the  presence 
of  washer  fc  on  a  pole  and  move  larger  washers  on  top  of  it. 

Li  the  text,  we  ignored  the  presence  of  the  largest  washer.  This  is  actually  reducing  to  a  simpler 
problem  because  we  can  pile  other  washers  on  top  of  it  as  if  it  were  not  there. 
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7.3.11  (a)   We  have 

H*{n,S,E,G) 


H*{n-1,S,E,G)      S^E    H*{n- 1,G,E,S)      E^G  H*{n-1,S,E,G) 

(b)  The  initial  condition  is  hi  =  2.  For  n  >  1  we  have  /i*   =  3/i*  _i  +  2. 
Alternatively,  Hq  =  0  and,  for  n  >  0,  /i*  =  3/i*_i  +  2. 

(c)  The  general  solution  is  /i*  =  3"  —  1.  To  prove  it,  use  induction.  First,  it  is  correct  for  n  =  0. 
Then,  for  n  >  0, 

hi  =  3K_^  +  2  =  3(3"~i-l)  +  2  =  3"-l. 

7.3.12  (a)   You  may  not  have  taken  care  of  all  the  initial  conditions  in  your  code  if  you  didn't  state 
the  recursion  carefully.  We'll  use 

S{n,k)  =  S{n- l,k  -  1)  +  kS{n- l,k)       when    n  >  0  and  fc  >  1 

with  the  initial  conditions  S{0,  fc)  =  0  for  fc  >  0  and  S{n,  1)  =  1  for  n  >  0. 

It  will  be  useful  to  define  two  operations.  Let  7^  be  a  collection  of  partitions  of  a  set  not 
containing  t.  Define  Add('P,  t)  to  be  the  collection  of  partitions  P  U  {t}  where  P  G  V;  that  is, 
the  result  of  adding  the  block  {t}  to  each  partition  in  P.  Define  lns{V,t)  to  be  the  collection 
of  partitions  {P  —  B)  U  {B  Li  {t}}  where  B  &  P  &  V;  that  is,  the  result  of  adding  t  to  one  of 
the  blocks  of  each  partition  in  P.  Note  that  if  P  has  k  blocks,  then  t  is  added  to  each  in  turn 
producing  k  new  partitions. 

S(T,fc) 

/*  Do  the  S{0,k)  case.  */ 

If  T  =  0,  Return  0. 

/*  Do  the  S{n,l)  case.  */ 

If  k  =  I,  then  Return  {T} . 

Select  t€T  and  let  U  =  T  -  {t}  . 

Return  Add(S(C/,fc  -  l),t)  U  Ins  (S([/, /e),  t) 

End 

(b)  The  three  cases  show  here  ar  T  =  0,  A;  =  0  and  the  rest;  i.e.,  T  ^  0  and  k  >  0.  As  in  the  code, 
U  =  T-{t}  for  some  t  e  T. 

S{<D,k)      s(r,i)  S{T,k) 

0  {T}        kdd(.S{U,k-l),t)  Ins(S([/,A;),i) 

7.3.13  (b)    Induct  on  n.  It  is  true  for  n  =  1.  If  n  >  1,  a2, . . . ,  a„  G  G{k2,  ■  ■  ■ ,       by  the  induction 

hypothesis.  Thus  ai,  02, . . . ,  a„  is  in  ai, if  and  ai,  R{H). 

(c)  Induct  on  n.  It  is  true  for  n  =  1.  Suppose  n  >  1  and  let  the  adjacent  leaves  be  61, . . . ,  6„  and 
Ci, . . . ,  c„,  with  c  following  h.  If  61  =  Ci,  then  apply  the  induction  hypothesis  to  G{k2,  ■  •  ■ ,  kn) 
and  the  sequences  62, . . . ,  6n  and  C2, . . . ,  c„.  If  61  ^  ci,  it  follows  from  the  local  description 
that  ci  =  61  + 1,  that  62,  •  •  • ,  &n  is  the  rightmost  leaf  in  H  (or  R{H))  and  that  C2, . . . ,  c„  is  the 
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leftmost  leaf  in  R{H)  (or  H,  respectively).  In  either  case,  62,  •  •  • ,  and  C2, . . . ,  c„  are  equal 
because  they  are  the  same  leaf  of  H. 

(d)  Let  i?„(a)  be  the  rank  of  ai, . . .  ,an-  Clearly  Ri{a)  —  ai  —  1.  If  n  >  1  and  ai  =  1,  then 
Rn{ct)  =  Rn-i{ct2,  ■  ■  ■ ,  If  n  >  1  and  ai  =  2,  then  -R„(a)  =  2"  —  1  —  i?„_i(a2, . . . ,  a„). 
Letting  Xi  =  ai  —  1,  we  have  -R„(a)  =  (2"  —  l)xi  +  (— l)^^i?„_i(a2, . . . ,  a„)  and  so 

i?„(a)  =  (2"-l)a;i  +  (-l)"H2""'-l)a;2  +  (-l)"^+"H2"-2-l)x3  +  -  •  •  + 

(e)  If  you  got  this,  congratulations.  Let  j  be  as  large  as  possible  so  that  ai, . . .  ,0:^  contains  an 
even  number  of  2's.  Change  aj.  (Note:  If  j  =  0,  a  =  2, 1, . . . ,  1,  the  sequence  of  highest  rank, 
and  it  has  no  successor.) 

7.3.14.  Your  answer  here  will  depend  on  exactly  how  you  set  up  your  procedures.  In  the  procedures 
we  have  written,  information  on  where  to  return,  perhaps  some  temporary  storage  for  compiler 
generated  variables,  and  the  following  variables  will  all  go  onto  the  stack. 

1.  The  integers  m,  q  and  r. 

2.  The  integers  n,  q  and  r. 
4.  The  integer  n. 

7.  The  sets  T  and  U ,  the  integer  k  and  the  clement  t.  This  pseudocode  is  rather  far  from  actual 
code  in  many  languages  because  the  procedure  returns  a  set  whose  elements  are  partitions  of 
a  set.  There  will  undoubtedly  be  some  storage  associated  with  this,  perhaps  in  the  form  of  a 
linked  list  of  pointers.  As  a  result  of  replacing  the  pseudocode  with  code,  we  would  probably 
create  a  few  additional  variables  that  are  pointers. 

Section  7.4 

7.4.1.  Let  M(n)  be  the  minimum  number  of  multiplications  needed  to  compute  x".  We  leave  it  to 
you  to  verify  the  following  table  for  n  <  9 


n 

2    3    4    5    6    7    8    9    15    21    47  49 

M{n) 

12233434     5     6     8  7 

Since  15  =  3  x  5,  it  follows  that  Af(15)  <  M{i)  +  Af(5)  =  5.  Likewise,  M(21)  <  Af(3)  +  M(7)  =  6. 
Since  the  binary  form  of  49  is  IIOOOI2,  M(49)  <  7.  Since  47  =  IOIIII2,  we  have  M(47)  <  9,  but 
we  can  do  better.  Using  47  =  2  x  23  +  1,  gives  M(47)  <  A'/(23)  +  2,  which  we  leave  for  you  to  work 
out.  A  better  approach  is  given  by  47  =  5  x  9  +  2.  Since  x"^  is  computed  on  the  way  to  finding  , 
it  is  already  available  and  so  M(49)  <  M(5)  +  M(9)  +  1  =  8.  It  turns  out  that  these  are  minimal, 
but  we  will  not  prove  that. 

7.4.2  (a)   Starting  with  Fq,  the  values  are  0,  1,  1,  2,  3,  5,  8  and  13. 

(b)  The  equations  follow  immediately  from  the  recursion  and  the  definition  of  M.  To  compute  F„, 

use  the  ideas  in  Example  7.21  to  calculate  P  =  M"~^  rapidly.  Then  F,,  =  {Pro)-!  =  P2.2- 

Another  approach  is  to  use  linear  algebra.  If  Ai,A2  are  the  eigenvalues  of  M,  then 

M  =  q(^^^    aI  )         ^^"^  matrix  Q  and  so  M"  =  Q  ^„  ^  Q-\ 

(c)  Write  R  =  M".  Since  (F„,  F„+i)*  =  i?(0, 1)*,  F„  =  Ri^2  and         =  ^2,2-  Since 

(j;+i,F„+2)'  =  iitTi  =  J?(l,l)*, 

=         +  R12  and  Fn+2  —  R2,i  +  R2,2-  Subtract  the  two  earlier  equations  from  these 

and  use  the  recTirsion  for  the  Fibonacci  numbers. 

(d)  This  follows  from  the  previous  part  and,  for  F2n,  the  rearranged  recursion       =  Fn+i  —  F^-i- 


Solutions  Manual  63 


7.4.3.  Let  Vn  be  the  transpose  of  {a., 


'711 


Un+k-i)-  Then  tT„  =  Mw„_i  where 


1 
0 


0 

1 


0 


0 


M  = 


0 

\ak 


0 


0 


7.4.4  (a)  Divide  the  coins  into  three  nearly  equal  piles,  Pi,  P2  and  Po,  in  suc;li  a  way  that  the 
first  two  piles  have  an  equal  number  of  coins.  Compare  the  first  two  piles  in  the  scales.  If  they 
differ  in  weight,  the  counterfeit  coin  is  in  the  lighter  pile.  If  they  have  the  same  weight,  the 
counterfeit  coin  is  in  P3. 

(b)  There  are  n  possibilities  for  the  counterfeit  coin.  Construct  the  decision  tree  for  the  algorithm, 

labeling  the  leaves  with  the  identity  of  the  counterfeit  coin.  Thus,  there  must  be  at  least  n 
leaves.  Since  weighing  has  three  possible  outcomes  (equal,  heavier  and  lighter),  each  vertex  in 
the  tree  has  at  most  three  sons.  In  the  best  possible  case,  each  nonleaf  vertex  will  have  three 
sons  with  at  most  one  exception  which  has  two  sons.  Also,  the  sons  of  this  exception  arc  leaves 
and  no  leaves  differ  in  height  by  more  than  one.  (These  facts  can  be  proved  as  they  are  for 
sorting.)  The  rest  of  the  argument  is  like  the  proof  of  the  sorting  theorem. 

(c)  Proceed  as  in  the  known  case;  however,  compare  Pi  with  P2  and  compare  Pi  with  P3.  This 
determines  if  the  counterfeit  is  lighter  or  heavier  and  which  of  the  three  piles  it  is  in.  The  rest 
of  the  weighings  now  proceed  as  in  the  case  where  the  counterfeit  was  known  to  be  lighter. 
To  see  that  the  relative  weight  of  the  counterfeit  is  determined  simply  consider  the  possible 
cases  as  shown  in  the  table,  where  Wi  is  the  weight  of  Pj  and  an  entry  i.  j  indicates  that  the 
counterfeit  is  in  pile  Pj  and  it  is  lighter  or  heavier  according  as  j  =L  or  j  =H. 


(d)  One  possibility  is  to  again  divide  the  coins  into  three  piles.  If  wi  ^  W2,  the  heavier  pile  contains 
no  counterfeits  and  so  can  be  removed.  Merge  the  other  two  piles  and  start  again.  If  wi  =1^2, 
then  either  wi  >  W3  or  wi  <  W3.  In  the  former  case,  both  counterfeit  coins  are  in  P3,  so  we 
can  start  again  with  P3.  If  w;i  <  ws,  one  coin  is  in  each  of  Pi  and  P2  and  they  can  be  searched 
separately  by  our  earlier  strategy. 

7.4.5.  Finding  a  maximum  of  n  items  can  be  done  in  0(n).  so  it's  the  computation  of  all  the 
different  F{v)  values  is  the  problem.  Thus  we  could  compute  the  values  of  F  separately  from  finding 
the  maximum.  However,  since  it's  convenient  to  compute  the  maximum  while  we're  computing  the 
values  of  F,  we'll  do  it. 

The  root  r  of  T  has  two  sons,  say  sl  and  sr.  Observe  that  the  answer  for  the  tree  rooted  at  r 
must  be  either  the  answer  for  the  tree  rooted  at  s/,  or  the  answer  for  the  tree  rooted  at  sr  or  F{r). 


Wi  <  W3     Wi  =  W3     Wi  >  W3 


Wi  <  W2 
Wi  =  W2 
Wi  >  11)2 


1,L  2,H  never 

3,H  never  3,L 
never         2,L  1,H 


Also 


F{r)  =  f{r)  +  F{sL)  +  F{sR). 


Here's  an  algorithm  that  carries  out  this  idea. 
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/*  r  is  the  root  of  the  tree  in  what  follows.  */ 

Procedure  BestSum(r) 

Call  Recur  (r,  F value,  best) 
Return  best 

End 

Procedure  Recur  (r,  F,  best) 
If  r  is  a  leaf  then 

F  =  /(r) 
best  =  f{r) 

Else 

Let  Sl  and  sr  be  the  sons  of  r. 

Recnr (sL,FL,bL) 

Recur  (sji,FR,bR) 

F  =  f{r)  +  Fl  +  Fr 

best  =  max(i^,  61,,  bu) 
End  if 
Return 

End 

Since  /(r)  is  only  used  once,  the  running  time  of  this  algorithm  is  0(n)  for  n  vertices,  an  improvement 
over  6(nlnn). 

7.4.6.  Now  iiistcaci  of  an  interval  that  is  divided  into  two  picc-cs.  a  fast  algorithm  will  have  to 
allow  for  a  rectangular  region  that  can  be  divided  in  two  either  horizontally  or  vertically.  If  you 
imagine  putting  together  the  results  of  a  horizontal  and  a  vertical  division,  you  will  see  that  the  best 
rectangular  region  to  sum  over  may  lie  entirely  within  one  of  the  pieces,  may  overlap  two  of  them  or 
may  overlap  all  four  of  them.  To  allow  for  all  this  efficiently,  it  turns  out  that  vectors  of  information 
are  needed.  We'll  leave  it  to  you  to  work  further  if  you're  interested. 

Section  8.1 

8.1.1.  This  is  exactly  the  situation  in  the  text,  except  that  there  is  now  one  additional  question 
when  one  reaches  a  leaf. 

8.1.2.  There  are  only  two  rooted  binary  trees  with  4  leaves.  (These  are  just  rooted — not  planar.) 

They  are  •  •  and   *  *.  Every  path  to  a  leaf  in  the  first  tree  has  length  two  and  so  AC(T)  =  2.  If  the 
•••• 

frequencies  are  /i, . . . ,  /4  from  left  to  right,  the  average  cost  for  the  second  tree  is  3/1+3/2  +  2/3  +  /4, 
which  in  our  case  is  1.9  <  2. 

8.1.3  (a)  If  T  is  not  a  full  binary  tree,  there  is  some  vertex  v  that  has  only  one  child,  say  s.  Shrink 
the  edge  {v,  s)  so  that  v  and  s  become  one  vertex  and  call  the  new  tree  T' .  If  there  are  k  leaves 
in  the  subtree  whose  root  is  v,  then  TC(T')  =  TC(r)  -  k. 

(b)  We  follow  the  hint.  Let  k  be  the  number  of  leaves  in  the  subtree  rooted  at  v.  Since  T  is  a  binary 
tree  and  v  is  not  a  root,  fc  >  2  Let  d  =  h{v)  —  h{l2)  and  note  that  d  =  {h{li)  —  1)  —  h{l2)  >  1. 
The  distance  to  the  root  of  every  vertex  in  the  subtree  rooted  at  v  is  decreased  by  d  and  the 
distance  of  h  to  the  root  is  increased  by  d.  Thus  TC  is  decreased  by  fcrf  —  d  >  0. 

(c)  By  the  discussion  in  the  proof  of  the  theorem,  we  know  that  the  height  of  T  must  be  at  least  m 
because  a  binary  tree  of  height  m  —  1  or  less  has  at  most  2"*"^.  Suppose  T  had  height  M  >  m. 
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By  the  previous  part  of  this  exercise,  the  leaves  of  T  have  heights  M  and,  perhaps,  M  —  1. 
Thus,  every  vertex  of  v  with  h{v)  <  M  —  \  has  two  children.  It  follows  that  T  has  2^^^  vertices 
w  with  h{'w)  =  M  —  1.  If  these  were  all  leaves,  T  would  have  2^~^  >  2™  leaves;  however,  at 
least  one  vertex  u  with  d{u)  =  M  —  1  is  not  a  leaf.  Since  it  has  two  children,  T  has  at  least 
2™  +  1  leaves,  a  contradiction. 

(d)    By  the  previous  two  parts,  all  leaves  of  T  have  height  at  most  m.  If  T'  is  principal  subtree  of 
T,  its  leaves  have  height  at  most  m  —  1  in  T' .  Hence  T'  has  at  most  2™"^  leaves. 

The  argument  hints  at  how  to  construct  the  desired  tree:  Construct  T',  a  principal  subtree 
of  T,  having  all  its  leaves  at  height  m  —  1  in  T'.  Construct  a  binary  tree  T"  having  n  —  2™"^ 
such  that  TC(T")  is  as  small  as  possible.  The  principal  subtrees  of  T  will  be  T'  and  T" . 

8.1.4.  With  one  exception,  the  proof  in  the  text  can  be  used  with  k  replacing  2.  The  exception 
occurs  when  we  consider  the  principal  subtrees  of  T.  As  in  the  text,  we  can  dismiss  the  case  of  only 
one  principal  subtree.  We  must  deal  with  d  principal  subtrees  where  1  <  c?  <  fc.  (When  fc  =  2,  it 
follows  that  d  =  2  as  in  the  text.)  The  function  whose  minimum  is  given  in  the  exercise  replaces 
f{x)  in  the  text. 

8.1.5  (a)  Suppose  the  answer  is  Sn-  Clearly  5*0  =  1  since  the  root  is  the  only  vertex.  We  need  a  re- 
cursion for  Sn-  One  approach  is  to  look  at  the  two  principal  subtrees.  Another  is  to  look  at  what 
happens  when  we  add  a  new  "layer"  by  replacing  each  leaf  with  *^ 

For  the  first  approach,  S'n  =  1  +  2S'„_i,  where  each  Sn-i  is  due  to  a  principal  sub- 
tree and  the  1  is  due  to  the  root.  The  result  follows  by  induction: 

Sn  =  l  +  2Sn-l  =  l  +  2(2"-l)  =  2"+i-l. 

For  the  second  approach,  5„  =  Sn-i  +  2"  and  so  Sn  =  (2"  -  1)  +  2"  =  2"+^  -  1.  By  the  way, 
if  we  have  both  recursions,  we  can  avoid  induction  since  we  can  solve  the  two  equations 

Sn  =  l  +  2Sn-i       and       Sn  =  5„-i+2" 

to  obtain  the  formula  for  Sn-  Thus,  by  counting  in  two  ways  (the  two  recursions),  we  don't 
need  to  be  given  the  formula  ahead  of  time  since  we  can  solve  for  it. 

(b)   Let  the  value  be  TC*(n).  Again,  we  use  induction  and  there  are  two  approaches  to  obtaining 
a  recursion.  Clearly  TC*(1)  =  0,  which  agrees  with  the  formula. 

The  first  approach  to  a  recursion:  Since  the  principal  subtrees  of  T  each  store  Sn-i  keys 
and  since  the  path  lengths  all  increase  by  1  when  we  adjoin  the  principal  subtrees  to  a  new 
root,  TC*(n)  =  2(5„_i  +  TC*(n  -  1)).  Thus 

TC*(n)  =  2(2"- l  +  (n-2)2"  +  2)  =  2((n  -  1)2"  +  1)  =  (n  -  1)2"+^  +  2. 


For  the  second  approach,  TC*(n)  =  TC*(n  —  1)  +  n2".  Again,  we  can  prove  the  formula  for 
TC*(n)  by  induction  or,  as  in  (a),  we  can  solve  the  two  recursions  directly. 
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2:1  2:1 
21  12  3:1  3:1 

321  3:2  231  3:2 

312         213         132  123 

Figure  S.8.1  The  decision  trees  for  binary  insertion  sorts.  Go  to  tlie  left  at  vertex  i  :  j  ii  Ui  <  Sj  and  to 
the  right  otherwise.  (You  may  have  done  the  reverse  and  gotten  the  mirror  images.  That's  fine.) 


Section  8.2 

8.2.1.  Here  are  the  first  few  and  the  last. 

1.  Start  the  sorted  list  with  9. 

2.  Compare  15  with  9  and  decide  to  place  it  to  the  right  giving  9,  15. 

3.  Compare  6  with  9  to  get  6,  9,  15. 

4.  Compare  12  with  9  and  then  with  15  to  get  6,  9,  12,  15. 

5.  Compare  3  with  9  and  then  with  6  to  get  3,  6,  9,  12,  15. 


16.  We  now  have  the  sorted  list  1,  2,  3,  4,  5,  6,  7,  9,  10,  11,  12,  13,  14,  15,  16.  Compare 
8  with  7,  with  12  with  10  and  then  with  9  to  decide  where  it  belongs. 

8.2.2.  At  each  comparison,  we  roughly  halve  the  number  of  places  in  which  ui  could  belong.  This 
halving  process  must  continue  until  just  one  position  is  left.  After  c  comparisons,  we  have  about 
(•  •  •  ((t/2)/2)  •  •  •  /2)  =  t/2'=  positions  left.  Thus  1/2"  is  about  1  and  so  c  is  about  logg  t.  (This  will  be 
exact  if  t  is  a  power  of  2.)  The  total  number  of  comparisons  needed  is  about  J2t=i  ^^Q'^.^i  which  is 
about  J"  log2  t  dt,  which  is  about  nlog2'n. 

8.2.3.  See  Figure  S.8.1.  To  illustrate,  suppose  the  original  list  is  3,1,2.  Thus  Ui  =  3,  M2  =  1  and 
U3  =  2. 

•  We  start  by  putting  m  in  the  sorted  list,  so  we  have  si  =  3. 

•  Now  ?i2  must  be  inserted  into  the  list  si.  We  compare  U2  with  si,  the  2:1  entry.  Since 
si  =  3  >  1  =  U2,  we  go  to  the  left  and  our  sorted  list  is  1,  3  so  now  si  =  1  and  S2  —  3. 

•  Now  U3  must  be  inserted  into  the  list  Si,S2-  Since  we  are  at  3:1,  we  compare  W3  =  2  with 
Si  =  1  and  go  to  the  right.  At  this  point  we  know  that      must  be  inserted  into  the  list  S2-  We 

compare      =  2  with  S2  =  3  at  3:2  and  go  to  the  left. 

8.2.4.  First  we  sort  on  the  units  digit  with  4  buckets: 

1  :  41,  21       2  :  empty       3  :  33       4  :  14,  24. 

Collect  these  in  a  hst,  preserving  the  order:  41,  21,  33,  14,  24.  Now  sort  on  the  tens  digit,  preserving 
order: 

1  :  14       2  :  21,  24       3  :  33       4  :  41. 
Collect  these  in  a  list,  preserving  the  order:  14,  21,  24,  33,  31. 
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8.2.5  (a)  Suppose  that  the  alphabet  has  L  letters  and  let  the  ith  letter  (in  order)  be  aj.  Let  Uj  be 
a  word  with  exactly  k  letters.  The  following  algorithm  sorts  Ui , . . . ,  u„  and  returns  the  result 
as  X\ , . . . ,  x^. 

BUCKET  (Ml,...,  u„) 

Copy  Ui,...,M„  to  Xi,...,Xn- 

/*  t  is  the  position  in  the  word.  */ 

For  t  =  k  to  1 


/*  Make  the  buckets.  */ 

Create  L  empty  ordered  lists. 
For  j  =  1  to  n 

If  the  tth  letter  of  Xj  is  Oj, 

then  place  Xj  at  the  end  of  the  ith  list. 
End  for 

Copy  the  ordered  lists  to  xi,...,Xn,  starting  with 

the  first  item  in  the  first  list  eind  ending  with  the 
last  item  in  the  ith  list. 

End  for 


(b)    Extend  all  words  to  k  letters  by  introducing  a  new  letter  called  "blank"  which  precedes  all 
other  letters  alphabetically.  Apply  the  algorithm  in  (a). 

8.2.6.  Wc  only  do  the  first  algorithm  since  the  second  is  built  from  it  in  a  simple  fashion.  Let  A{p) 
be  the  assertion  that  after  p  steps  the  words  Xi, . . . ,  x„  arc  in  order  if  we  ignore  the  first  k—p  letters 
of  each  word.  To  prove  .4(0),  note  that  we  are  ignoring  all  the  letters  and  so  any  order  is  fine.  We 
now  prove  A{p)  for  k  >  p  >  Q  using  A{p  —  1).  Look  at  the  situation  after  arranging  the  Xj's  in  the 
L  ordered  lists.  If  two  words  have  different  pth  letters,  they  appear  in  different  lists,  the  one  with 
the  alphabetically  earlier  letter  appearing  in  the  earlier  list.  Therefore,  they  will  be  in  proper  order 
when  the  L  lists  are  copied  into  the  Xi's.  If  two  words  have  the  same  pth  letter,  they  appear  in  the 
same  list  and  their  order  in  that  list  is  the  same  as  it  was  before  they  were  placed  there.  Since  that 
order  was  correct  for  their  last  p  —  1  letters  by  A{p—\),  they  will  be  in  the  proper  order  when  copied 
into  the  Xj's. 

8.2.7.  First  divide  the  list  into  two  equally  long  tapes,  say 


Think  of  each  tape  as  containing  a  series  of  1  long  (sorted)  lists.  (The  commas  don't  appear  on  the 
tapes,  they're  just  there  to  help  you  see  where  the  lists  end.)  Merge  the  first  half  of  each  tape,  list 
by  list,  to  tape  C  and  the  last  halves  to  D.  This  gives  us  the  following  tapes  containing  a  series  of  2 
long  sorted  lists: 

C:  9  14,  1  15,  6  10,  4  12       D:  2  3,  7  13,  11  16,  5  8. 
Now  we  merge  these  2  long  lists  to  get  4  long  lists,  writing  the  results  on  A  and  B: 

A:  2  3  9  14,  1  7  13  15       B:  6  10  11  16,  4  5  8  12. 

Merging  back  to  C  and  D  gives 


End 


A:  9,  15,  6,  12,  3,  7,  11  5 


B:  14,  1,  10,  4,  2,  13,  16,  8. 


C:  2  3  6  9  10  11  14  16 


D:  1  4  5  7  8  12  13  15. 


These  are  merged  to  produce  one  16  long  list  on  A  and  nothing  on  B. 
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8.2.8.  Read  the  items  in     at  a  time,  sort  the  k  items  and  write  them  out  onto  tapes  A  and  B, 

the  first  k  onto  A,  the  next  k  onto  B,  the  next  k  onto  A,  etc.  Now  do  a  merge  sort  using  A  and  B, 
starting  with  k  long  lists  on  each  tape  instead  of  1  long  lists.  Note  that  if  fc  =  2™,  then  we  have 
avoided  m  iterations  of  the  merge  sort. 

8.2.9.  A  split  requires  n  —  1  comparisons  since  the  chosen  item  must  be  compared  with  every  other 

item  in  the  list  In  the  worst  case,  we  may  split  an  n  long  list  into  one  of  length  1  and  another  of 
length  n  —  1.  We  then  apply  Quicksort  to  the  list  of  length  n  —  1.  If  W{n)  comparisons  are  needed, 
then  Wil)  =  0  and  Win)  =  n  -  1  +  w{n  -  1)  bv  n  >  1.  Thus  W{n)  =  J^^Zl  k  =  n{n-  l)/2. 

Suppose  that  n  —  2^^  and  the  lists  are  split  evenly.  Let  E{k)  be  the  number  of  comparisons. 
Since  Quicksort  is  applied  to  two  lists  of  length  n/2  after  splitting,  E{k)  =  n  —  1  +  2E{k  —  1)  for 
A;  >  0  and  £'(0)  =  0.  A  little  computation  gives  us  E{1)  =  1,  E{2)  =  5,  E{S)  =  17,  E{4)  =  49  and 
E{5)  =  129.  From  the  statement  of  the  problem  we  expect  E{k)  to  be  near  k2'^,  which  has  the  values 
0,  2,  8,  24  and  64.  Comparing  these  sequences  we  discover  that  E{k)  =  2{k  —  1)2*^"^  +  1  for  A:  <  6. 
This  is  easily  proved  by  induction 

8.2.10  (a)    The  number  of  comparisons  is  actually  (n  —  1)  +  Q{rn)  +  Q{{1  —  r)n)  by  the  same 
argument  used  in  the  previous  exercise. 

(b)  Suppose  the  claim  is  true  for  rn  and  (1  —  r)n,  which  are  both  less  than  n.  By  the  previous 
part,  Q{n)  is  about 

n  +  arnln(rn)  +  a  (1  —  r)n ln((l  —  r)n) 

=  n  +  arnhi  n  +  arnlnr  +  a(l  —  r)nlnn  +  a(l  —  r)n ln(l  —  r) 
on  In  n  +  n[l  +  or  In  r  +  o(l  —  r)  ln(l  —  r)] . 

If  this  is  to  equal  an  Inn,  the  factor  [•  •  •]  must  be  0.  (This  is  not  a  rigorous  inductive  proof 
because  we've  loosely  thrown  around  the  word  "about.") 

(c)  By  the  previous  problem,  the  answer  is  about  nlog2  n.  Our  formula  for  a  is  1  +  aln(l/2)  and 
so  o  =  l/ln2.  Since  (nlnn)/(ln2)  =  nlogj  n,  we  are  done. 

(d)  The  equation  for  a  is  1  —  a(ln3  +  21n(3/2))/3  =  0  and  so  o  =  3/ln(27/4).  If  we  are  using 
natural  logarithms,  a  =  1.57. 


Section  8.3 

8.3.1.  Since  there  are  only  3  things,  you  cannot  compare  more  than  one  pair  of  things  at  any  time. 
By  the  Theorem  8.1,  we  need  at  least  log2(3!)  comparisons;  i.e.,  at  least  three.  A  network  with  three 
comparisons  that  sorts  is  given  in  Figure  8.2. 

8.3.2.  See  the  previous  solution.  We  need  at  least  log2(4!)  comparisons  (i.e.,  we  need  five  or  more). 
We  can  compare  two  pairs  at  the  same  time,  so  the  fastest  network  must  take  at  least  three  time 
units.  Here  is  a  network  that  does  it. 


To  prove  that  it  sorts  you  could  look  at  all  4!  =  24  possible  permutations  of  the  inputs.  It's  easier 
to  use  the  Zero-One  Principle. 

8.3.3.  As  argued  in  the  previous  two  solutions,  we  will  need  at  least  seven  comparisons  and  we  can 
do  at  least  two  per  time.  This  means  it  will  take  at  least  four  time  units.  It  has  been  shown  (but 
not  in  this  text!)  that  at  least  five  time  units  are  required.  A  brick  wall  sort  works. 
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8.3.4.  If  you  have  an  upper  or  lower  bound  on  the  values  to  be  sorted,  you  can  "pad  out"  the  list 
with  one  of  these  values  to  obtain  n  items.  The  added  items  will  be  at  one  end.  What  if  you  don't 
know  such  a  bound?  Now  it  depends  on  your  situation  a  bit  more.  By  one  pass  through  the  items 
one  can  find  an  upper  (or  lower)  bound.  If  this  is  not  desirable,  you  can  pad  the  list  with  copies  of 
anything  and  then  pass  the  sorted  items  through  a  hardware  or  software  device  to  remove  the  pads. 

8.3.5.  One  possibility  is  the  type  of  network  shown  in  Figure  8.3.  For  n  inputs,  this  has 

l  +  2  +  ...  +  (n-l)  =     ^  ^  ' 

comparators.  It  was  noted  in  the  text  that  a  brick  wall  for  n  items  must  have  length  n.  If  n  is  even 
there  are  {n/2){n—  1)  comparators  and  if  n  is  odd  there  are  n((n—  l)/2)  comparators.  Thus  this  is 
the  same  as  Figure  8.3.  We  don't  know  if  it  can  be  done  with  less. 

8.3.6  (a)   If  the  network  is  used  k  times,  it  is  the  same  as  2k  time  units  of  the  brick  wall.  To  see 

this,  write  down  a  new  copy  of  the  network  each  time  you  are  supposed  to  feed  it  back  in.  For 
sorting  to  have  taken  place  for  all  inputs,  we  must  have  2k  >  n.  It  follows  that  f{n)  =  n  for 
n  even  and  /(n)  =  n  +  1  for  n  odd. 

(b)  This  problem  was  discussed  in  the  first  exercise  for  a  general  network.  There  is  a  new  twist 
here  though.  Suppose  we  wish  to  sort  m  items.  Feed  them  in  as  the  top  m  inputs  and  pad  out 
the  last  n  —  m  inputs  with  an  upper  bound.  The  network  will  be  done  after  being  run  /(w) 
times.  Another  twist:  forget  the  padding  and  disable  the  appropriate  lower  comparators.  This 
works  because  the  network  for  the  first  m  lines  is  also  a  brick  wall. 

(c)  If  you  write  down  what  the  comparators  are,  you  will  see  that  this  is  the  brick  wall  again  in 
disguise;  however,  the  wires  are  now  "rotating"  around.  We  need  to  know  when  they've  gotten 
back  to  their  starting  positions.  That  happens  after  n  shifts.  Thus  we  have  sorted  output  after 
n  time  units  provided  we  read  what  would  be  fed  in  as  new  input  next  time.  If  we  read  the 

output,  n  +  1  time  units  are  required. 

8.3.7.  By  the  Adjacent  Comparisons  Theorem,  we  need  only  check  the  sequence  n, ...  ,2, 1.  Using 
the  argument  that  proves  the  Zero-One  Principle,  it  follows  that  this  sequence  is  sorted  if  and  only 
if  all  sequences  that  consist  of  a  string  of  ones  followed  by  a  string  of  zeroes  are  sorted. 

8.3.8  (a) 

5  — I   4                4  — I —  2    2  — I   1 

4  — I   5  — I —  2  — I —  4  — ^        1  — I   2 

3  — I   2  — I —  5  — I —  1  — I        4  — I   3 

2  — I   3  — I —   1  — I —  5  — :        3  — I   4 

1    1  — I —  3    3  — '        5    5 

If  you  start  at  the  leftmost  5,  you  can  follow  vertical  and  horizontal  lines  so  that  you  pass  just 
the  fives  and  end  at  the  rightmost  5.  Imagine  that  the  network  is  made  of  bars  and  standing 
upright  as  it  is  pictured.  Now  remove  all  the  bars  along  which  you  have  travelled  and  gently 
lower  the  upper  portion  so  that  it  rests  on  the  lower  one  at  the  same  time  sliding  it  to  the  left 
one  time  unit  (or  comparator).  The  result  is  a  network  that  is  sorting  the  sequence  4,  3, 2, 1  by 
adjacent  comparisons;  in  fact,  the  network  is  a  brick  wall.  This  will  be  clearer  if  you  copy  the 
network  to  a  piece  of  paper,  cut  it  along  the  path  and  slide  the  pieces  together. 

8.3.9.  It  is  evident  that  the  idea  in  the  solution  to  (a)  of  the  previous  exercise  works  for  any  n.  This 
can  be  used  as  the  basis  of  an  inductive  proof. 

An  alternative  proof  c;an  be  given  using  sequences  that  consist  of  ones  followed  by  zeroes.  (See 
Exercise  8.3.7.)  Note  that  when  the  lowest  1  starts  moving  down  through  the  comparators,  it  moves 
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down  one  line  each  time  unit  until  it  reaches  the  bottom.  The  1  immediately  above  it  starts  the  same 

process  one  time  unit  later.  The  1  immediately  above  this  one  starts  one  more  time  unit  later,  and 
so  forth.  If  there  are  j  ones  and  the  lowest  1  reaches  the  bottom  after  an  exchange  at  time  t,  then 
the  next  1  reaches  proper  position  after  an  exchange  at  time  t  +  1.  Continuing  in  this  way,  all  ones 
are  in  proper  position  after  the  exchanges  at  time  t  +  j  —  l.  Suppose  the  jth  1  (i.e.,  lowest  1)  starts 
moving  by  an  exchange  at  time  i.  Since  it  reaches  position  after  n  —  j  exchanges,  t  =  i  +  (n  —  j)  —  1. 
Thus  all  ones  are  in  position  after  the  exchanges  at  time  {i  +  {n  —  j)  —  1)  +  j  —  1  =  n  +  i  —  2. 
The  jth  1  starts  moving  when  it  is  compared  with  the  line  below  it.  This  happens  at  time  1  or  2. 
Thus  n  +  i  —  2  <  n. 

8.3.11.  Use  induction  on  n.  For  n  =  2^,  it  works.  Suppose  that  it  works  for  all  powers  of  2  less  than 
2*.  We  use  the  variation  of  the  Zero-One  Principle  mentioned  in  the  text.  Suppose  that  the  first 
half  of  the  xis  contains  a  zeroes  and  the  second  half  contains  (3  zeroes.  EMERGE  calls  BMERGE2  with 
k  =  j  =  2*~^  By  the  induction  assumption,  BMERGE2  rearranges  the  "odd"  sequence  Xi,  0:3, . . . ,  a;2t_i 
in  order  and  the  "even"  sequence  X2,X4, . . .  ,X2t  in  order.  The  number  of  zeroes  in  the  odd  seqence 
minus  the  number  of  zeroes  in  the  even  sequence  in  0,  1  or  2;  depending  on  how  many  of  a  and  /3 
are  odd.  When  the  difference  is  0  or  1,  the  result  of  BMERGE2  is  sorted.  Otherwise,  the  last  zero  in 
the  odd  sequence,  Xa+p+i,  is  after  the  first  one  in  the  even  sequence,  Xa+0,  and  all  other  Xj's  are  in 
order.  The  comparator  in  EMERGE  with  i  =  {a  +  (3)/2  fixes  this. 

8.3.12.  This  is  essentially  the  same  as  when  n  is  a  power  of  2;  however,  one  must  be  a  bit  careful 

with  the  indices  k  and  j. 

8.3.13  (a)  Since  a  one  long  list  is  sorted,  nothing  is  done  and  so  5(0)  =  0.  The  two  recursive  calls 
of  ESORT  can  be  implemented  by  a  network  in  which  they  run  during  the  same  time  interval. 
This  can  then  be  followed  by  the  EMERGE  and  so  S{N)  <  S{N  -  1)  +  M{N). 

(b)  As  for  5(0)  =  0,  M(0)  =  0  is  trivial.  Since  all  the  comparators  mentioned  in  EMERGE  and  be 

run  in  parallel  at  the  same  time,  M{N)  <  M{N  —  1)  +  1. 

(c)  From  the  previous  part,  it  easily  follows  by  induction  on  N  that  M{N)  <  N.  Thus 

<  S{N  —  1)  +  N  and  the  desired  result  follows  by  induction  on  N. 

(d)  If  2^-1  <  n  <  2^,  then  the  above  ideas  show  that  S{n)  <  N{N  +  l)/2.  Thus 

S{n)  <  ^  ( 1  +  log2  n)  (2  +  logs  • 

Section  9.1 

9.1.1.  We  do  PREV(T). 

PREV(T) 

Let  r  be  the  root  of  T 

Let  Ti,...,Tk  be  the  principal  subtrees  of  T 
Output  r 

For  i  =  l,...,k  Prev(Ti) 

End 

9.1.2.  In  the  theorem.  Step  4  corresponds  to  a  return  from  a  recursive  call  and  Step  5  corresponds 
to  a  recursive  call. 

9.1.3.  We  give  pseudocode  for  vertex  visitation. 
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BFV(T) 

Initialize  queue 
INQUEUECD 

While  queue  not  empty 

S  =  OUTqUEUE( ) 

Let  r  be  the  root  of  S 

Let  Si,...,Sk  be  the  principal  subtrees  of  S 
Output  r 

For  i  =  l,...,k  INQUEUE(S'i) 
End  while 

End 

9.1.4.  Here  is  the  pseudocode. 

PREV(T) 

Initialize  stack 

PUSH(T) 

While  stack  not  empty 
S  =  POP( ) 

Let  r  be  the  root  of  S 

Let  Si,...,Sk  be  the  principal  subtrees  of  S 
Output  r 

For  i  =  k,...,l  PUSHC^j) 
End  while 

End 

9.1.5.  The  proof  can  be  done  by  induction  on  the  size  of  the  tree  by  showing  that  the  comments  in 
the  algorithm  are  correct.  To  do  this,  we  need  to  notice  a  couple  of  things. 

•  By  removing  r  from  G  before  constructing  S,  we  guarantee  that  S  will  not  contain  r. 
Thus  it  will  contain  precisely  the  vertices  that  are  reachable  on  a  path  from  r,  starting 

with  the  edge  {r,  s}. 

•  Because  we  remove  the  root  vertex  of  the  tree  from  G  and  do  this  recursively,  whenever 
a  tree  is  ready  to  return,  all  its  vertices  have  been  removed  from  G.  As  a  result,  none  of 
the  vertices  in  S  are  left  in  G  when  we  construct  R. 

9.1.6  (a)   There  is  only  one  tree  with  vertex  set  V  when  \V\  =  1. 

(b)  After  the  root  v  of  T,  the  next  vertex  visited  in  a  depth- first  traversal  is  the  root  of  Ti. 

(c)  The  root  of  Ti  will  be  listed  in  POSTV  after  all  the  other  vertices  of  Ti  have  been  visited.  The 
vertices  listed  previous  to  this  will  be  exactly  the  other  vertices  of  Ti  since  they  will  not  be 
visited  again  but  all  other  vertices  of  T  will  be  visited  again. 

(d)  From  the  previous  part,  Ti  has  t  vertices.  After  listing  the  root  v  of  T,  PREV  lists  the  t  vertices 
of  Ti  and  then  list  other  vertices  of  T. 

(e)  U  is  simply  T  with  Ti  removed.  Thus  we  can  obtain  the  traversal  sequences  for  U  by  traversing 
T  and  "forgetting"  to  list  any  vertices  of  Ti  that  we  encounter. 

(f)  Since  Ti  and  U  have  fewer  vertices  than  T,  they  can  both  be  reconstructed  by  induction.  To 
get  T,  simply  adjoin  Ti  to     as  a  new  leftmost  child  of  v. 

9.1.8.  Here  is  the  pseudocode. 
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PREVandPOSTVCoi, . . .  ,a„;  Zi,...,z„) 
If  n  =  1 

Return  the  single  vertex  tree  ai 
End  if 

Define  t  hy  Zt  =  02 

Ti  =PREVaiidP0STV(a2, . . . ,  at+i ;  Zi,...,Zt) 

U  =PREVandPOSTV(ai,  Ot+2, . . . ,  a„ ;  Zt+i,  ...Zn) 

T  is  the  tree  obtained  by 

adding  Ti  to  U  as  a  new  leftmost  child  of  ai 
Return  T 

End 

9.1.9  (a)    D{T)  is+l,D(Ti),-l,+l,i?(T2),-l,...,+l,L»(T„),-l. 

(b)  Each  edge  is  traversed  twice,  proving  the  sum.  The  rest  can  be  proved  by  induction  on  the 
number  of  vertices  using  the  formula  in  (a).  Actually,  one  can  show  more:  The  sum  up  to  k  is 
the  length  of  the  path  from  the  root  to  the  vertex  that  is  reached  after  k  steps. 

(c)  The  "if"  part  follows  from  (b).  The  "only  if"  part  can  be  done  by  showing  that  there  is  a 
unique  way  to  construct  a  tree  associated  with  such  a  sequence.  This  can  be  done  recursively 
if  we  use  the  observation  that  the  sum  up  to  fc  is  0  if  and  only  if  we  have  returned  to  the  root 
after  k  steps:  Let  k  be  the  first  index  for  which  the  sum  is  0.  We  must  have  Si  =  +1,  Sfc  =  — 1 
and  the  subsequences  S2,  •  •  • ,  Sfe-i  and  s^+i, . . . ,  s„  are  associated  with  unique  RP-trees.  There 
is  just  one  way  to  piece  these  trees  together. 

Section  9.2 

9.2.1. 

(a)       +        (b)  + 
+     5  +5 
+     4         +  + 
+     3  12     3  4 

1  2 

9.2.2.  After  "End  if"  insert 

If  exp  =  —expl 

Return  the  RP-tree  with  root  ' '  — ' ' 
and  son  INTERPRET (ea;pl) . 

9.2.3.  We  use  value(-  •  •)  to  indicate  the  value  of  a  variable  or  constant, 
(a)  The  first  method: 

EVALUATE (exp) 

If  (exp  has  no  op)      Return  value (exp) . 
If  (exp  =  —expl)      Return  — value(ea;pl) . 
Let  exp  =  {expl  op  ;  exp2) . 
Return  EVALUATE  (expl)  op  EVALUATE  (ea;p2)  . 

End 


(c)       +  (d)       /  (e)  + 

1     +  +  -  -  * 

2+  X     *     X    Y       *     3     X  + 

3+  5     Y  X    Y  XI 

4  5 
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(b)  The  second  method: 

EVALUATE (T) 

Let  r  be  the  root  of  T. 

Let  k  be  the  number  of  principal  subtrees  of  T 

and  let  Tj  be  the  ith  of  them. 
If  (A;  =  0)  ,  Return  value  (r). 
For  i  =  l,...,k      Let  Vi  =  EVALUATE(Ti) . 
/*  If  k=l,  r  should  be  imary  minus .  */ 
If  fc  =  1 ,  Return  r  Vi . 
If  k  =  2.  Return  vi  r  V2- 

End 

9.2.5.  We  will  indicate  what  needs  to  be  added.  Other  solutions  are  possible. 

(a)  exp  — >    —  term 

(b)  term  power       and       power  factor    \    factor  **  power 

(c)  Let  subst  be  the  start  symbol  now  and  add    subst  exp    \    id  :—  exp 

(d)  This  is  a  bit  trickier  because  the  :—  must  reach  as  far  to  the  right  as  possible.  In  particular, 
you  cannot  replace  the  last  three  items  in  the  following  list  with  just  factor  — >  subst.  Let 
start  be  the  start  symbol. 

start  — >    exp    \  subst 

subst  id  :=  exp    \    id  :=  subst 

exp  — >    exp+ subst    \    exp— subst 
term  — !■    term*  subst    \    term/ subst 
factor  (  subst ) 

9.2.6  (a)    It  consists  of  all  strings  of  the  form  ai  ±  a2  •  •  •  ±  a„  where  n  >  1,  ai  is  x  or  y  and  ±  is 
either  "+"  or  "— " ,  except  that  y  cannot  follow  "— " . 

(b)  G'  is  G.  The  grammar  for  G"  is 

s  — >  X  t    I    y  t 

t^  +{t,+xt,2)    I     -{t,-xt,2)    I     +{t,+yt,2)    I  the-empty-string 

(i,+xi,2)  ^  xt 
(i,-x^,2)  -Kt 
{t,+yt,2)^yt 

The  machine  has  four  states  corresponding  to  the  left  sides  of  the  four  productions  just  given. 
The  start  state  is  s  and  the  accepting  state  is  t.  You  should  be  able  to  draw  it. 

(c)  The  machine  is  deterministic. 

(d)  Yes.  One  can  construct  a  three  state  machine  with  start  state  s  and  accepting  state  t  as  shown 
here. 

+ 

start      ^  X  *  ™ 

y  X 
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Section  9.3 

9.3.1.  The  construction  starts  with  •.  The  first  iteration  gives  •  and  all  trees  that  have  children 
produced  in  the  starting  step.  Thus  we  get 

•  •  •  • 

•  ••  •  •  • 

In  the  next  iteration,  we  obtain  the  following  new  trees  with  at  most  4  vertices. 

•  •  •  • 

•  •  ••  •• 

•  ••  •  • 

In  the  next  step,  the  only  new  tree  is  a  4-vertex  tree  consisting  of  a  path  from  the  root  to  a  single 
leaf.  After  this,  no  new  trees  with  less  than  5  vertices  are  obtained. 

9.3.2.  The  construction  gives  the  trees  in  Figure  9.5,  then  the  next  step  produces  4  new  trees.  After 
that,  no  new  trees  with  less  than  5  leaves  are  obtained. 

9.3.3.  For  k<7,  the  values  are  in  the  text,  bs  =  429,  bg  =  1430  and  bw  =  4862. 

9.3.4.  Here  are  the  calculations: 

(6164  +  6363)  +  0x62  +  0  =  7 

(6165  +  62^4)  +  ibib2  +  0  X  61  +  0)  X  63  +  0  =  21 

(6165  +  &2&4)  +  0  X  62  +  0   =  19. 

9.3.5.  We'll  use  n:  r  to  mean  a  tree  with  n  leaves  and  rank  r  and  (ni:  ri,  n2- ^2)  to  mean  a  tree  with 

left  son  ni:  ri  and  right  son  ^2:  r2.  We  use  formula  (9.4)  and  the  greedy  approach:  First  make  |Ti|  as 
large  as  possible,  then  make  RANK(Ti)  as  large  as  possible.  Here  are  the  calculations.  You  should 
be  able  to  construct  the  trees  easily  from  the  results  as  long  as  you  remember  (a)  that  n:  0  describes 

a  tree  in  which  all  the  left  sons  are  leaves  (since  that  is  the  leftmost  tree  in  the  list  of  trees)  and 
(b)  that  since  there  is  only  one  tree  with  1  leaf  and  only  one  with  2  leaves,  they  each  have  rank  0. 


8: 100  = 

(1:0,  7:100) 

since  b^h^  =  132  >  100,  we  have  |Ti|  =  1  and  100  =  O67  +  100 

7: 100  = 

(6:10,  1:0) 

since  blb&^  h  6562  =  90  and  10  =  IO61  +  0 

6:10  = 

(1:0,  5:10) 

since  6165  =  14  >  10  and  10  =  O65  +  10 

5:10  = 

(4:1,  1:0) 

since  6164  H  +  6362  =  9  and  1  =  I61  +  0 

4:1  = 

(1:0,  3:1) 

3:1  = 

(2:0,  1:0) 

8:  200  =  (3: 1,  5: 12)    since  6167  +  62?>6  =  174  and  26  =  I65  +  12 
5: 12  =  (4:  3,  1:  0)  since  &164  +  &2&3  +  63^2  =  9 

4:  3  =  (3:  0,  1:  0)  since  61 63  +  ^'262  =  3 

8:300  =  (7:3,  1:0) 

n:  3  =  (1:  0,  n  —  1:  3)  for  n  >  5  since  bibn-i  >  3 

4:3  =  (3:0,  1:0) 

8:400=  (7:103,  1:0)    7: 103  =  (6: 13,  1: 0)    6: 13  =  (1: 0,  5: 13) 

5:13=  (4:4,  1:0)    4:4=  (3:1,  1:0) 

9.3.6.  We  will  induct  on  n.  The  result  is  true  for  n  =  1. 

Suppose  that  T  is  a  tree  with  n  >  1  leaves.  The  root  of  T  has  a  left  son  with,  say  fc  >  0  leaves. 
Thus  the  right  son  has  n  —  >  0  leaves.  It  follows  that  k  <  n  and  n  —  k  <  n  and  so  by  induction, 
the  left  and  right  sons  have  a  total  of  (fc  —  1)  +  (n  —  —  1)  =  n  —  2  other  vertices.  Counting  the 
root,  we  see  that  T  has  (n  —  2)  +  l  =  n  —  I  other  vertices. 
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9.3.7  (b)  In  the  notation  introduced  in  Exercise  9.3.5,  with  n  =  2m  +  1  and  k  =  6„/2,  we 
claim  that  A^„  ^  n:k  =  (m  +  1:  0,  m:  0).  To  prove  this,  note  that  the  rank  of  this  tree  is 
bib2m  H  1-  bmbm+i  and  that 

bn    =    bib2m-\  \-b2mbl    =    2{bib2m-\  \- bmbm+l)  ■ 

(c)  If  n  =  2m,  then  6„  =  2(6i&2m-i  +  •  •  •  +  bm-ibm+i)  +  b^,  which  is  divisible  by  2  if  and  only 
if  bm  is.  Thus  there  is  no  such  tree  unless  bm  is  even.  In  this  case  you  should  be  able  to  show 
that  M2m  =  {fn:bm/'2,  m:0). 

9.3.8.  Let  the  function  in  one  line  form  be  (/i, . . . ,  fk)  and  denote  the  rank  by  RANK(/i, ....  fk). 
Then  RANK(n)  =  n  -  1  and,  for  fc  >  1  arguments,  RANK(a,  6, . . . ,  z)  =  C^^)  +  RANK(6,  ...,z). 

9.3.10.  If  /  is  a  permutation  of  fc,  define  /*,  a  permutation  of  fc  —  1,  by 


r(i)  =  f{i+l)-H{f{i  +  l)-f{l)),    where    H{x)  =  jj 


0,    if  a;  <  0, 
otherwise. 


In  effect,  /*  is  /  with  the  first  vahie  removed  and  the  rest  pushed  down  so  that  the  range  becomes 
fc  —  1.  Let  p  be  a  permutation  of  n.  Define 

JO,  if  n  =  1, 

i.ex^p,nj  -  (p(i)_i)+Lex(p*,n-l),    if  n  >  L 

We  leave  it  to  you  to  convince  yourself  that  this  function  does  compute  the  lex  order  rank  of  p. 

9.3.11  (a)   An  Xj  belongs  in  a  parenthesis  pair  that  has  nothing  inside  it.  Number  the  empty  pairs 

from  left  to  right  and  insert  .t,  into  the  pair. 

(b)  This  is  just  a  translation  of  what  has  been  said.  If  yoTi  are  confused,  remember  that  B(n)  should 
is  to  be  thought  of  as  all  possible  parentheses  patterns  for  xi, . . .  ,Xn- 

(c)  This  simply  involves  the  replacement  described  in  (b):  Make  •  correspond  to  (  )  and  make 
the  tree  with  sons  Ti  and  T2  correspond  to  (P1P2),  where  P,  is  the  parentheses  pattern  corre- 
sponding to  Ti. 

9.3.12  (a)  We  prove  by  induction  on  the  number  of  vertices  that  /  maps  a  tree  with  n  vertices  to 
one  with  n  leaves.  This  is  certainly  true  for  the  single  vertex  tree,  [  ]  =  •.  Let  L{T)  and  V{T) 
be  the  number  of  leaves  and  vertices  of  a  tree.  We  have 

L(/([ri,...,Tfe])  =  L(/(Ti))+i(/([r2,...,Tfe]))  by  (9.5); 

=  V{Ti)  +  V{[T2,...,Tk])  by  induction; 

=  F([Ti,...,Tfc]). 

An  easy  way  to  see  that  /  is  a  bijection  is  to  exhibit  its  inverse.  Let  g{*)  =  •  and 

£(([i?i,B2])  =  \g{Bi),Si,. . .  ,Sj\,  where  [i3i,B2]  is  an  unlabeled  full  binary  RP-tree  and 
g{B2)  =  [^i, . . . ,  Sj].  One  can  prove  by  induction  that  g{f{T))  =  T  and  f{G{B))  =  B.  We  do 
the  latter  using  the  above  definition  of  the  SiS. 

f{g{B))  =  f(^[g{B,),Si,...,S,])  by  defn.  of  g; 

=    'f{g{B,)),f{[Si,...,Sj])]  bydefmof/; 

=   'f{g{B,)),f{g{B2))]  by  defn.  of  5i's; 

=  [Bi,B2]  =  B  by  induction. 


(b)    We  have  introduced  an  operation  called  JOIN  for  two  full  binary  RP-trees.  Redefine  it  for  any 
two  RP-trees  5  and  T  =  [Ti, . . . , Tk]  to  be  [5, Ti, . . . , Tk]. 
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9.3.13  (a)    The  leaves  in  an  RP-tree  are  distinguishable  because  the  tree  is  ordered.  Thus,  each 
marking  of  the  n  leaves  leads  to  a  different  situation.  The  same  comments  applie  to  vertices 

and  there  are  2n  —  1  vertices  by  Exercise  9.3.6. 

(b)  Mark  the  single  vertex  that  arises  in  this  way  to  obtain  an  element  of  V„  Interchanging  x  and 
the  tree  rooted  at  b  gives  a  different  element  of  >C„+i  that  gives  rise  to  the  same  element  of 

Vn. 

Conversely,  given  any  element  of  V„,  the  marked  vertex  should  be  split  into  two,  /  and 
b  with  b  a  son  of  /.  Introduce  another  son  x  of  f  which  is  a  marked  leaf.  There  are  two 
possibilities-make  /  a  left  son  or  a  right  son. 

(c)  By  (a),  |£„|  =  n6„  and  |V„|  =  (2n  -  1)6„.  By  (b),  |£„+i|  =  2|V„|. 

(d)  By  the  recursion, 

,         2(2n-3),  2(2n- 3)  2(2n- 5),  2"-i(2n  -  3)(2n  -  5)  •  •  •  1 

bn    =   bn-1    =  bn-2    =    ■■■    =  —  Oi. 

n  n         n  —  1  nyn  —  1)  •  •  •  2 

Using  b\  =  1,  we  have  a  simple  formula;  however,  it  can  be  written  more  compactly: 
2"-i(2n  -  3)(2n  -  5)  •  •  •  1        2"-i(n  -  1)!  (2n  -  3)(2n  -  5)  •  •  •  1 


bn  = 


n\  {n  —  1)!  n! 

(2n-2)!    _   1  (2n-2^ 


(n  —  1)!  n!       n  \  n  —  1 


Section  10.1 

10.1.1  (d)  Note  that  r  =  1  +  x  +  x"^  +  x^  +  ■  ■  ■.  Whenever  you  add,  subtract  or  multiply  power 
series  and  look  for  the  coefficient  of  some  power  of  x,  say  x",  only  those  powers  of  x  that  do 
not  exceed  n  in  the  original  series  matter.  Each  of  p,  q  and  r  begin  1  +  x  +  x"^  +  x^. 

10.1.2.  We  can  neglect  all  terms  above  x^  at  each  step  of  our  calculations.  We'll  use  w  to  indicate 
that  we've  neglected  such  terms. 

(a)  {2  +  X  +  x'^){l  +  2a;  +  a;^)(l  +  X  +  2x'^)  w  (2  +  5a;  +  5a;^)(l  +  x  +  2a;^)  and  so  the  answer  is  14. 

(b)  (1+2.t  +  .t2)2  l+4.T  +  6a;2,  {l  +  x  +  2x'^f  k.  l  +  3a;  +  9a;2,  {l  +  Ax  +  &x'^){l  +  ix  +  Qx'^)  k. 
1  +  7.T  +  27.T^,  and  so  the  answer  is  62. 

(c)  This  is  the  same  as  the  coefficient  of  a;  in  (1  +  a;)^"^ (2  —  a;)^,  so  we  need  keep  only  constant  and 
linear  terms.  Thus  (1  +  a;)^^  1  +  43a;  and  (2  -  a;)^  w  2^  -  5  x  2^a;  =  32  -  80a;.  The  answer 
is  32  X  43  -  80  =  1296. 

10.1.3.  We  have 

{x^  +  x""  +  x"  +  x'' +  x^f  =  X^\l  +  X  +  X^  +  X^  +  X^f  =  X 

The  coefficient  of  x"^^  in  this  is  the  coefficient  of  x°  in  the  eighth  power  on  the  right  hand  side.  Since 
(1  —  a;^)^  =  1  —  8a;^  +  •  •  this  is  simply  the  coefficient  of  x^  in  (1  —  a;)~^  minus  8  times  the  coefficient 
of  a;°  (the  constant  term)  in  (1  —  a;)~^.  Thus  our  answer  is 


=  784. 
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10.1.4  (a)   Let  f{z)  =  (1  +  zy.  It  is  easy  to  show  that 

=  r{r  -  I)--- {r  -  k  + + zy-''. 

Thus  f^''\Q)/k\  =  (^).  Taylor's  Theorem  completes  the  derivation  as  long  as  we  ignore  con- 
vergence. 

To  prove  convergence,  we  need  to  verify  that  the  remainder  term  in  Taylor's  formula  goes 
to  0  as  — >  oo.  We'll  do  this  for  C  =  1  /3.  Taylor's  Theorem  with  remainder  states  that 

/(^)  =  ^-^^"^^Q)^"  where    \Rk{x)\  <  M'm 


n=0 


n! 


k\ 


and  M  =  max|4|<|^|  From  the  above  calculations, /W(t)/fc!, ;  =  {l){l  +  ty-''. 

We  first  bound  the  binomial  coefficient.  Since  we  are  interested  in  convergence,  we  can 
assume  that  k  is  as  large  as  we  wish.  If  r  >  0,  then  |r  —  fc|  <  A;  when  k  is  large  enough  and  so 


r 

r-  1 

r  —  k 

(0 

1 

2 

k 

is  bounded  because  all  but  the  first  few  factors  are  1  or  less.  On  the  other  hand,  if  r  <  0  let  m 
be  an  integer  greater  than  — r  and  note  that 


< 


1 


1 


(to  —  1)!     TO    TO  +  1 


k 


{k-ry 


Since  all  but  the  last  factor  is  at  most  1 ,  it  follows  that  this  expression  is  bounded  by  a  constant 
(depending  on  r)  times  Putting  all  this  together,  there  are  constants  A  and  B  depending 

on  r  so  that  |(^) |  <  Ak^  when  k  is  large. 

Now  suppose  that  \t\  <  \x\  <  1/3.  Then  for  large  k,  k  —  r  >  0  and 

\Rk{x)\  <  Ak""  ."'^"'''l!  <  ^fc^,,  ^Y^jl  =  ^(2/3)'■A:^(l/2)^ 
'   ^       -         mint(l +  -  (1-1/3)'=-''  ^  '  '      ^  '  '  ' 

which  goes  to  0  as  fc  — >  oo. 

(b)  In  the  previous  result,  set  r  =  -1.  Then  Q  =  ("^^)  =  (-1)*=  and  z''  =  {-l^z''.  Thus 

(1  -         =  ^(-l)'=(-l)'=z'=  =  ^z'^.  Multiply  both  sides  by  a. 

(c)  You  may  be  familiar  with  this  formula:  it's  the  sum  of  a  (finite)  geometric  series.  There  are 
various  ways  to  do  obtain  the  result.  We'll  give  two. 

First,  from  scratch.  Let  S  =  Y^^=o  '^'^^ ■  Note  that  the  fc"^  term  of  zS  equals  the  (k  + 
term  of  S.  Thus  almost  all  terms  cancel  if  we  subtract  zS  from  S.  In  fact,  S  —  zS  =  a  —  az^^^. 
Solve  this  for  S. 

Second,  from  (b).  Note  that  with  S  as  above  and  i  =  k  —  {n+1), 
^az''  =  S+   ^  az''  =  5+^(az"+i)2\ 

fc>0  A:>Ti+l  i>0 

By  (b),  the  leftmost  sum  is  a/(l  —  z)  and  the  rightmost  sum  is  (az"+^)/(l  —  z).  Solve  this  for 
S. 

(d)  Use  (a)  with  {z,  r)  =  {-ax,  -2).  Note  that  {~^)  =  {-lY{k  +  1).  Thus  the  coefficient  of  a;"  is 
(n+  l)a". 
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10.1.5.  We'll  do  just  the  general  k. 

(a)  We  have  x''A{x)  =  J2m.>o  o-m^™'^^  =  J2n>k  o-n-kX^-  Thus  the  coefficient  of  a;"  is  0  for  n  <  fc 

and  a„_fe  for  n  >  k. 

(b)  We  have 

7   \    k  OO  ^  J 


m=0         ^      ^  m=A; 


a;™  =  ^  a„(m)(m-l)---(m-fc  +  l)a;" 


Set  n  =  m  —  k  to  obtain  the  answer:  a„+fc(n  +  fc)(n  +  fc  —  1)  •  •  •  (n  +  1)  =  a„+fe 


(n+fc)! 


(c)    Since  {x-£.)  A{x)  =  J2m=o  fnamx"^,  repeating  the  operation  fc  times  leads  to  X]rn=o  'm'^CLmX"^- 

Thus  the  answer  is  n'^a„. 
10.1.6  (a)    Let  hn  =  1  for  all  n  >  0  in  the  theorem.  Then  C{x)  =  A{x)/{1  -  x). 

(b)  Since  we  are  summing  on  fc,  not  n,  this  may  be  a  bit  confusing.  We  apply  (a)  with  the  n  in 
(a)  replaced  by  fc  and  aj  —  (— Thus  the  generating  function  for  the  sum  is  C{x)  = 
A{x)/{l-x).  By  the  Binomial  Theorem  (p.  17),  A{x)  ^  Thus  B{x)  =  (l-a;)""!  and 
so,  by  the  Binomial  Theorem  again,  the  answer  is  (— 

(c)  You  should  be  able  to  see  that 

n  rn 

dn  =  ^  ^  O'i^n—i    where  —  ^  ^  bjCrg—j- 

i=0  3=0 

Thus  D{x)  =  A{x)E{x)  and  E{x)  =  B{x)C{x),  and  so  D{x)  =  A{x)B{x)C{x). 

10.1.7.  This  is  simply  the  derivation  of  (10.4)  with  r  used  instead  of  1/3.  The  generating  function 
for  the  sum  is  S{x)  =  1/(1  —  r(l  +  a;))  and  the  coefficient  of  x''  is 


(r/(l-r))'   _  (r/(l-r)) 


fc+i 


1  -  r 

To  verify  convergence,  let  a„  =  (^)r"  and  note  that 


lim 

n— ^oo 


=    hm   =    r    <  1. 

n->(X)  n  —  fc  +  1 


10.1.8.  The  main  difficulty  here  is  understanding  what  corresponds  to  what.  In  the  convolution 
formula,  an  index  fc  ranges  from  0  to  n  in  the  summation  and  here  i  ranges  from  0  to  fc.  Thus 
(n,  fc)  in  the  convolution  formula  corresponds  to  (fc,  i)  here.  The  values  of  n  and  m  here  are  simply 
constants.  Let's  choose  the  unused  variable  j  for  an  index  in  hopes  of  avoiding  some  confusion.  We 
have  Uj  =  (™)  and  bj  =  Q).  By  the  Binomial  Theorem,  A{x)  =  (1  +  a;)™  and  B{x)  =  (1  +  a:)". 
Thus  C{x)  =  A{x)B{x)  =  (1  +  x)™+".  By  the  Binomial  Theorem,  cj  =  (""+"). 

10.1.9.  This  is  very  similar  to  the  Exercise  10.1.8  With  aj  =  (— 1)-'  (™)  and  bj  =  (™),  we  can  apply 
the  convolution  formula.  The  result  is  C{x)  =  (1  —  a;)™(l  +  a;)™  =  (1  —  a;^)™.  By  the  binomial 
theorem,  (1  —  a;^)™  =  X^(— 1)-' Thus,  the  sum  we  are  to  simplify  is  zero  if  fc  is  odd  and 
(-1)^  (7)  iffc  =  2i. 
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10.1.10.  It's  simply  a  matter  of  expanding  generating  functions  and  looking  at  coefficients,  realizing 
that  (—1)"  is  +1  or  —1  according  as  n  is  even  or  odd.  The  answer  for  (b)  is  {A{x)  —  A{—x))/2.  To 
answer  (c),  use  (a)  with  A{x)  =  (1  +  a;)".  The  result  is  g  ^(1  +  x)"  +  (1  —  When  a;  =  1  we 

obtain  2"-^. 

10.1.11.  The  essential  fact  is  that  2_/s=o  ^"^^  is  fc  if  r  is  multiple  of  k  and  0  otherwise. 

10.1.12.  As  a  first  method,  we'll  follow  the  example  in  the  text.  If  F{x,y)  =  ^  (^^)x^y'^,  then 
the  generating  function  for  Sk  is  S{x)  =  F{x,  1/2).  We  have 

oo  oo  ^ 

F{x,y)  =  +  =  E((l  +  ^)'y)"  =  I- (1  +  ^)2^- 

n=0  n=0  {L  +  Xj  y 

Hence  S{x)  =  2/(1  —  2x  —  x^).  This  can  be  done  by  partial  fractions  as  discussed  in  the  next  section. 
Now  we  do  the  problem  using  bisection  of  series  Let  G{x,y)  =  ^„    (^)a;'^j/"  =  (l  —  (1  + 

x)y)  ^.  We  use  bisection  to  extract  the  part  with  n  even  and  then  set  y  =  l/\/2  to  obtain  S{x): 

e/  X        G{x,y)  -  G{x,-y) 


(1  -  (1  +  x)/V2y'  +  (1  +  (1  +  x)/V2)-' 
2 

a/2  1  \/2  1 


V2-11-x/(a/2-1)      a/2  +  1  l  +  a;/(y2  +  l) 
V2  (V2  +  1)  _  V2  (1  -  V2) 
l-iV2  +  l)x  1-(1-a/2)x' 

By  the  formula  for  geometric  series,  Sk  =       (1  +  •\/2)'^+^  —  \/2  (1  —  •\/2)'^+^. 

Now  we  do  the  problem  using  the  bisection  of  series  idea  of  the  previous  paragraph  and  the 
result  of  Exercise  10.1.7.  If  we  call  the  answer  there  afe(r),  then  we  have 

Sk  =  i(afe(l/\/2)+afc(-l/\/2))  =  ^(^2  (1  +  \/2)'=+i  -  a/2  (1  -  \/2)'=+i). 

10.1.13.  This  is  multisection  with  fc  =  3  and  j  =  0, 2, 1,  respectively.  The  basic  facts  that  are  needed 
are  e'^  =  cos  6  +  ism6  and  the  sine  and  cosine  of  various  angles  in  the  30°-60°-90°  right  triangle. 

10.1.14  (a)    By  definition 

Nr  =  Y,\Si,  n . . .  n  Si^  . 

Thus  any  object  is  counted  in  Nr  as  many  times  as  we  can  choose  a  set  of  r  Sj's  each  of  which 

contains  the  object.  If  an  object  lies  in  exactly  j  of  Si's,  the  number  of  times  we  can  make  this 
choice  is  (^).  Partition  the  objects  according  to  j  to  obtain  the  result  {k  =  j  —  r). 

(b)  Multiplying  by  x*",  summing  on  r: 

N{x)  =  Y^eJAx^  =  Y,Ej{l  +  xy. 

(c)  Replace  a;  by  a;  —  1  in  (b)  and  equate  coefficients  of  x''  in  E{x)  =  N{x  —  1)  to  obtain  the  result. 
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Section  10.2 

10.2.1  (a)    Let  a„  =  5a^_i  —  6a^_2  +  bn  where  hi  =  1  and  6„  =  0  for  n  7^  1.  Then 

00 

A{x)  =  ^{5xak-ix''~^  -  6x'^ak-2x''~'^)  +  x  =  5xA{x)  -  6x'^A{x)  +  x. 


k=0 

Thus 

and  a„  =  3"  -  2". 


Mx)  =  ^  =  


(b)  To  correct  the  recursion,  add  c„+i  to  the  right  side,  where  Cq  =  1  and  c„  =  0  for  n  ^  0. 
Multiply  both  sides  by  a;"+^  and  sum  to  obtain  ^(a;)  =  xA{x)  +  6x'^A{x)  +  1.  With  some 
algebra, 

1  3/5  2/5 


1  -  a;  -  6a;2      1  -  3a;     1  +  2a; 
and  so  a„  =  (3"+^  -  (-2)"+i)/5. 

(c)   To  correct  the  recursion,  add  6„  where  6i  =  1  and  6„  =  0  otherwise.  A(a;)  =  a;A(a;)  +  a;^j4(a;)  + 
2x^A(a;)  +  x  and  so 

^(^)  =  1  -  a;  -  a;2  -  2a;3  "  (1  -  2a;)(l  +  a;  +  a;^) ' 

By  the  quadratic  formula,  we  can  factor  l+x+x"^  as  (1— tUx),  where  w  =  (— 1  +  \/^)/2 
and  a;  is  the  complex  conjugate  of  w.  Using  partial  fractions. 


2/7    _  (3-2V=3)/21  _  (3  +  2V=3)/21 
1  —  2a;  1  —  ujx  1  —  Wx 

and  so 

2"+i  _  (3  -  2\/=3)w"  _  (3  +  2^/=3)^I;" 
~  ^  21  21  • 

The  last  two  terms  arc  messy,  but  they  can  be  simplified  considerably  by  noting  that  uj^  =  1 
and  so  they  are  periodic  with  period  3.  Thus 


an  = 


2^+1      r  (—2/7)    if  n/3  has  remainder  0; 

—  1-  <  3/7         if  n/3  has  remainder  1; 

(—1/7)    if  n/3  has  remainder  2. 


(d)    The  recursion  holds  forn  =  0  as  well.  Prom  the  recursion,  A{x)  =  2xA{x)  +  ^  na;".  By  Exercise 
10.1.5,  the  sum  is  x-^  X^a;",  which  is  a;/(l  —  a;)^.  Thus 

.  ^  X  ^      2  1  1_ 

(l-a;)2(l-2a;)        1  -  2a;     1  -  a;  (l-a;)^- 

After  some  algebra  with  these,  we  obtain  a„  =  2"+^  —  n  —  2. 

10.2.2.  Wc  can  write  S{n}  =  2S{n  -  1)  +  1  +  a„  for  all  n  >  0  if  we  define  5(0)  =  0,  oq  =  -1,  and 
a„  =  0  for  n  ^  0.  Let  s(x)  =  Ylin>o  S{n)x".  From  the  recursion, 

s{x)  =  2a;s(a;)  +  ^  a;"  -  1  =  2a;s(a;)  +  a;/(l  -  a;). 

n>0 

Thus  s{x)  =  a;/(l-a;)(l-2a;).  By  partial  fractions,  s{x)  =  (l-2a;)-i-(l-a;)-^  andsoS'(n)  =  2"-l. 
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10.2.3.  Start  with  a  string  oi  n  —  i  zeroes.  Choose  without  repetition  i  of  the  n  +  1  —  i  positions 

(before  all  the  zeroes  or  after  any  zero)  and  insert  a  one  in  each  position  chosen.  The  result  is  an 
n  long  string  with  i  ones,  none  of  them  adjacent.  The  process  is  reversible:  The  position  of  a  one  is 
the  number  of  zeroes  preceding  it.  The  formula  for  F„  follows  immediately. 

10.2.4  (a)  The  elements  past  the  last  zero  must  alternate  one  and  two  since  there  can  be  no  adjacent 

ones  or  adjacent  twos.  There  are  n  —  k  such  terms.  The  first  k  elements  in  the  sequence  can 
be  anything  as  long  as  there  are  no  adjacent  ones  or  twos.  Since  the  A:*^  element  is  a  zero, 
removing  it  leads  to  A;  —  1  elements  that  must  satisfy  the  same  conditions.  (Ending  in  zero  is 
critical:  If  the  fc**^  element  had  been  one,  removing  it  would  lead  to  A;  —  1  elements  with  the 
additional  condition  that  the  last  element  could  not  be  one.) 

(b)  The  sequence  can  be  divided  into  a  first  part  (up  to  the  last  zero)  and  a  last  part  (after  the 
last  zero).  If  A;  ^  0,  the  first  part  is  an  arbitrary  sequence  of  the  same  sort  containing  A;  —  1 
terms.  If  0  <  fc  <  n,  the  last  part  is  either  1212  •  •  •  or  2121  •  •  •  and  so  there  are  2,Sfc-i  such 
sequences.  If  A;  =  n,  the  last  part  is  empty  and  so  there  are  s„_i  such  sequences.  If  A  =  0,  the 
first  part  is  empty  and  the  last  part  is  an  n  long  sequence  of  ones  and  twos.  There  are  2  such 
sequences.  Putting  all  this  together  gives  the  recursion. 

(c)  Let  6o  =  !>      =  2  for  n  >  0,  flo  =  1,  a„  =  s„_i  for  n  >  0  and  c„  =  s„.  With  this  definition, 

Cn  =  J2k=o°'kbn-k  for  all  n.  Thus  C{x)  =  A{x)B{x). 

(e)  1  -  2a;  -  a;2  =  (1  -  (1  +  V2)x){l  -  (1  -  V2)x)  and 

1  +  x      _    (1  +  a/2)/2  (1  -  ^/2)/2 

l-2x-x'^~  l-{l  +  ^)x  l-{l-^/2)x 

Thus  2a„  =  (1  +  V2)"+^  +  (1  -  ^2)"+^ 

Checking:  This  gives  sq  =  1,  si  =  3  and  S2  =  7,  which  are  correct  because  we  defined  sq  =  1, 
there  are  clearly  three  sequences  of  length  1,  and  the  sequences  of  length  two  are  all  of  the  3^ 
2-long  sequences  of  {0, 1, 2}  except  1,1  and  2,2. 

(f)  This  follows  from  the  previous  part  since  |1  —  v^|  <  1. 

10.2.5  (a)   Replacing  A,  B  and  C  with  their  definitions  and  rearranging  leads  to 

+  L^H2x"'  +  L2H1X"'  +  H^H^x''"'  =  (Li  +  Hix"'){L2  +  H2X"'). 

(b)  The  number  of  multiplications  required  by  any  procedure  is  an  upper  bound  on  Af(2m).  There 
are  three  products  of  polynomials  of  degree  m  or  less  in  our  "less  direct"  procedure.  If  they 
are  done  as  efficiently  as  possible,  we  will  have  M(2m)  <  3M(m). 

(c)  Let  Sfc  =  M{2'').  We  have  sq  =  1  and  si~  <  3sfc_i  for  A;  >  0.  If  we  set  to  =  1  and  ti~  =  3tk-i 
for  A  >  0,  then  Sk  <  tk-  The  recursion  gives  T{x)  =  3xT{x)  +  1  and  so  tk  =  3*^.  Thus,  with 
n  =  2*^,  M{n)  <  3*^  =  (2'°S2  3)fc  ^  ^log^  3  Yrom  tables  or  a  calculator,  log2  3  =  1.58  •  •  •. 

(d)  To  begin  with  Li{x)  =  1  +  2x,  Hi{x)  =  -1  +  3x,  L2{x)  =  5  +  2a;  and  H2{x)  =  -x.  The 
product  L\L2  =  (1  +  2a;) (5  +  2a;)  is  computed  using  the  algorithm.  The  values  are  m  =  1, 
A  =  (2)(2)  =  4,  S  =  (1)(5)  =  5  and  C=  (l  +  2)(5  +  2)  =  21.  Thus  L1L2  =  h+\2x  +  ^x^.  In  a 
similar  way,  the  products  (— 1  +  3a;)  (—a;)  =  x  —  'ix^  and  (5a;)  (5  +  a;)  =  25a;  +  5a;^  are  computed 
these  are  combined  to  give  the  final  result: 

(5  +  12a;  +  4a;2)  +  (x  -  Zx^)x'^  +  ((25a;  +  ^x^)  -  (5  +  12a;  +  \x^)  -  (x  -  'ix^))x^ , 

which  is  5  +  12a;  -     +  Vlx^  +  4a;^  +  a;^  -  3a;'^. 
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(e)  We'll  just  look  at  the  case  in  which  n  =  2m  =  2^ .  Let  be  the  number  of  additions  and 
subtractions  needed.  We  have  ag  =  0  and,  for  fc  >  0,  equals  3a/j_i  plus  the  number  of  ad- 
ditions and  subtractions  needed  to  prepare  for  and  use  the  three  multiplications.  Preparation 
requires  two  additions  of  polynomials  of  degree  m  —  1.  The  results  are  three  polynomials  of 
degree  2m  —  2.  We  must  perform  two  subtractions  of  such  polynomials.  Finally,  the  multipli- 
cation by  and  a;^"*  arranges  things  so  that  there  is  some  overlap  among  the  coefficients. 
In  fact,  there  will  be  2m  —  2  additions  required  because  of  these  overlaps  (unless  some  coeffi- 
cients happen  to  turn  out  zero).  Since  a  polynomial  of  degree  d  has  d-\-\  coefficients,  there  are 
a  total  of 

2(m  -  1  +  1)  +  2(2m  -  2  +  1)  +  (2m  -  2)  =  4n  -  4. 

Thus  Ofe  =  3afc_i+4x  2*^-4  and  so  A(x)  =  'ixA{x)  +  \-Y^^-^^^(2xY -\Yl,k>^^^ ■  Consequently, 
A{2^  =  Ax /{I  -  ,t)(1  -  2,r)(l  -  3.'i;)  and  =  2  x  3*^+^  -  2^+^  +  2.  Comparing  this  with 
the  multiplication  result,  we  see  that  we  need  about  three  times  as  many  additions  and/or 
subtractions  as  we  do  multiplications,  which  is  still  much  smaller  than      for  large  n. 

10.2.6  (a)   A  tree  is  either  a  single  vertex  {ti  =  1)  or  it  has  n  +  1  vertices  consisting  of  a  root  joined 
to  either  one  n-vertex  tree  or  two  trees  having  a  total  of  n-vertices. 

(b)  The  recursion  tn+i  =  a„+i  +  tn  +  '^k=o^ktn-k,  where  ai  =  1  and  a„  =  0  otherwise,  is  valid 
for  all  n.  Multiply  by         and  sum  to  obtain  T{x)  =  x  +  xT{x)  +  xT{xY. 

(c)  The  equation  in  (b)  can  be  written  as  xT^  —  (1  —  x)T  +  x  =  0,  a  quadratic  in  T  =  T[x).  The 
result  follows  from  the  quadratic  formula.  As  a;  — >  0,  T{x)  ^  to  =  Q.  The  denominator  in  the 
solution  approaches  zero  and  the  numerator  approaches  1  ±  1,  thus  the  minus  sign  is  correct. 

Section  10.3 

10.3.1  (a)    (1  -  x)D'  -D=  -e-^  =  -(1  -  x)D  and  so  (1  -  x)D'  -xD  =  Q. 

(b)  The  coefficient  of  a;"  on  the  left  of  our  equation  in  (a)  is 

n\        (n-1)!  (n-1)!" 

The  initial  conditions  are  Dq  =  1  and  D\  =  0. 

10.3.2  (b)   Looking  at  the  coefficient  of  x"  in  the  differential  equation  gives 

(n  +  l)a„+i  -  2na„  -  3(n  -  l)a„_i  =  a„-|-3a„_i. 

Rearrangement  leads  to  the  recursion.  The  initial  conditions  require  that  we  specify  ao  and  ai, 
the  first  two  coefficients  in  the  power  series  for  A{x).  Thus  ao  =  ^(0)  =  1  and  ai  =  A'(0)  =  1. 

(c)  Following  the  instructions  we  have 


A{x)  =  (l-(2x  +  3x2))-i/2  =  E(    y^)(-l)'(2a;  +  3a;2)'= 


fe=o  ^  ^ 

2\fc-l 


^g(-f)(-')'|:(:)(-)W) 


i,k 
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If  we  set  2k  —  i  =  n  so  that  i  =  2k  —  n,  the  last  sum  can  be  rewritten: 


n,k 

Thus 

l/2\^  k  \r,2k-nQn-k 

2k -n 


What  values  of  k  does  the  sum  range  over?  In  the  expansion  we  obtained  for  A{x)  summing 
on  i  and  k,  these  indices  could  range  over  all  nonnegativc  integers  with  i  <  k  Hence  k  can 
range  over  all  nonnegative  integers  such  that  2k  —  n  <  k.  In  other  words,  k  <  n.  The  fractional 
binomial  coefficient  can  be  rearranged: 

,/-l/2\   _  (l/2)(l/2  +  l)...(l/2  +  (t-l))  _  1.3...(2<:-1)  _  /2I:V_, 


I   *   )  =    M =   tl   =  [kP 

We  can  also  write  (2fc^„)  =  {k-(2k-n))  ~  in-k)-  Thus  we  have  the  nicer  looking  formula 


^  f2k\  (  k 

k=0  ^  ^ 


(3/2) 


n—k 


10.3.3  (b)   We  have  -21n(l  -x)-2x  =  Y.k>2  '^^  and 

fe>0  ^      ^  k>0 

By  the  formula  for  the  coefficients  in  a  product  of  generating  functions, 
_  k   =  2(n+l)X:^-E2 

/c — 2  k — 2  k — 2 


"  1 

2(n  +  1)  V  -  -  2(n  +  1)  -  2(n  -  1) 


fe=l 


n 

=  2(n  +  l)^^-4n. 

10.3.4.  We  multiply  by  x  and  differentiate: 


fc=i 


{xT{x)y  =  ^  + 


2      2Vl  -2x-  3a;2 ' 
Next  we  multiply  by  1  —  2a;  —  3x'^: 

(l-2.T-3.T2)(xT(a;))'  =  +(l+3a:)^^   =   2       —   ^-(l+3a;)a;r(x). 

Extracting  coefficients  using  [a;"]  (a;T(a;))'  =  (n  +  l)t„: 

(n  +  l)i„  -  2ni„_i  -  3(n  -  l)i„_2  =  cin  -  *n-i  -  3t„_2    where  a„ 
Rearranging: 

_  (2n  -  l)tn-i  +  3(n  -  2)^„_2  +  an 


2,    if  n  =  1, 
0,  otherwise. 


n  +  1 

for  n  >  0  with  the  understanding  that  tn  =  0  when  n  <  0. 
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Section  10.4 

10.4.1  (a)   This  is  nothing  more  than  a  special  case  of  the  Rule  of  Product — at  each  time  we  can 
choose  anything  from  T. 

(b)  Simply  sum  the  previous  result  on  k. 

(c)  The  hint  tells  how  to  do  it.  All  that  is  left  is  algebra. 

(d)  The  solution  is  like  that  in  the  previous  part,  except  that  we  start  with 


/  \ 


TeT  ^i=0  '  TeT 

10.4.2.  The  rewriting  should  be  fairly  obvious.  If  A{x)  is  the  generating  function  for  the  alternating 
sequences,  the  Rules  of  Sum  and  Product  give  us  S{x)  =  A{x)  +  S{x)xA{x).  As  in  Exercise  10.2.4, 

A(x)  =  1  +  2a;  +  2a;2  +  2a:^  H         =  1  +  -^^  =  lltii. 

1  —  X        1  —  x 

10.4.3  (a)    This  is  simply  2*  {0,  12*}*.  Thus  the  generating  function  is 

.  .  _  ^  1  _  1 

1-x        /  1    \      l-3x  +  x^' 

1  —  I  X  +  X- 


\-x, 

Multiply  both  sides  by  1  —  3a;  +  a;^  and  equate  coefficients  of  a;"  to  obtainl  the  recursion 

a„  =  3a„_i  -  a„_2    for  n  >  1 
with  initial  conditions  ao  =  1  and  a\  =  3. 
(b)   You  should  be  able  to  see  that  this  is  described  by  0*(11*0''0*)*.  Since 

1  1 


Gil. 0^0*  =  3:-  a; 


I  —  X     I  —  X      (1  —  x)2  ' 
the  generating  function  we  want  is 

Aix)  = 


1  -  X  1  -  x'=+i /(I  -  a;)2      l-2x  +  x^  -  x''+'^ ' 
Clearing  of  fractions  and  equating  coefficients,  we  obtain  the  recursion 

a„  =  2a„_i  -  a„_2  +  ^n-k-i  for  n  >  1 
with  the  understanding  that  aj  =  0  for  j  <  0.  The  initial  conditions  are  ao  =  ai  =  1. 
(c)    A  possible  formulation  is 

0*  (1(11)*00*)*  {A,  1(11)*}. 

This  says,  start  with  any  number  of  zeroes,  then  append  any  number  of  copies  of  the  patterns 
of  type  Z  (described  soon)  and  then  follow  by  either  nothing  or  an  odd  number  of  ones.  A 
pattern  of  type  Z  is  an  odd  number  of  ones  followed  by  one  or  more  zeroes.  The  translation 
to  a  generating  function  gives 

1  +  a;- — — ^)     where    Gz{x)  =  x- — —^x- 


l  —  X  Gz{x)  \    '     I  —     J  '       1  —  x"^    1  —  X 

After  some  algebra,  the  generating  function  reduces  to 

_      1  +  x-  x"^ 
-  i_a;-2a;2  +  a;3' 

which  gives  a„  =  a„_i  +  2a„_2  —  a^-s  for  n  >  2,  with  initial  conditions  ao  =  1,  ai  =  2  and 
a2  =  3. 
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10.4.4.  The  generating  functions  can  be  worked  out  just  as  was  done  for  p„  in  the  text:  Place  balls 
into  boxes  with  the  number  in  the  i*^^  box  being  a  multiple  of  i.  If  repeated  parts  are  not  allowed, 

the  i**^  box  receives  either  zeor  or  i  balls. 

(b)    The  coefficient  of  a;"  of  the  right  hand  side  is  zero  and  of  the  left  hand  side  is  J2k=oi~^)''^kPn-k- 
Rearranging  gives 

n/2  n/2 

Q2kPn-2k  =  ^  <?2fe+l?'n-2fe-l • 
fe=0  fc=0 

One  way  to  describe  this  is  that  if  we  look  at  all  pairs  of  partitions  such  that  (i)  the  first 
partition  has  distinct  parts  and  (ii)  the  sum  of  all  the  parts  in  both  partitions  is  n,  then  for 
exactly  half  of  the  pairs  the  first  partition  is  a  partition  of  an  odd  number.  We  don't  have  a 
direct  proof. 


(c)   The  answers  are 

k  k 

n(i+xo  and  nr^- 


10.4.5.  Here's  a  way  to  construct  a  pile  of  height  h.  Look  at  the  number  of  blocks  in  each  column. 
The  numbers  increase  to  h,  possibly  stay  at  h  for  some  time,  and  then  fall  off.  The  numbers  up  to 
but  not  including  the  first  h  form  a  partition  of  a  number  with  largest  part  at  most  ft.  —  1  and  the 
numbers  after  the  first  h  form  a  partition  of  a  number  with  largest  part  at  most  h.  The  structures 
are  these  partitions.  By  the  Rule  of  Product  and  Exercise  10.4.4 

= (n  T^)  (n  r^)  -  — ^ — ■ 

i=l 

Summing  this  over  all  ft,  >  0  and  adding  1  gives  ^  Snx".  No  simple  formula  is  known  for  the  sum. 

10.4.6  (a)  Define  a  map  /  by  6i  =  ai  and  bi  =  ai  —  at-i  for  i  >  1.  Then  6,  >  0  for  i  <  i  <  fc.  Note 
that  bi  +  b2  +  ■  ■  ■  +  bj  =  aj.  Thus  the  sum  of  the  6i's  does  not  exceed  n.  Also,  the  map  /  is 
invertible  and  any  positive  k  6j's  will  give  k  strictly  increasing  a^'s.  Thus  /  is  a  bijection. 

(b)  Let  your  jth  choice  be  bj  for  1  <  j  <  k.  Then  choose  the  amount  by  which  n  will  exceed 
the  sum.  Keep  track  of  the  bj  values  and  the  difference  between  n  and  the  sum.  Adding  these 
together  gives  n.  Since  there  is  exactly  one  way  to  make  bj  =  i  for  i  >  0  and  there  is  exactly 
one  way  to  choose  the  difference  between  n  and  the  sum  to  be  i  for  i  >  0,  the  result  follows. 

(c)  Note  that  i  and  have  the  same  parity  for  all  i  if  and  only  if  all  the  bj's  are  odd.  Reasoning 
as  in  the  previous  part  we  obtain 


X  Y  I  _  x'=(1  +  .t) 
l-x^J    l-x~  (l-a;2)fe+i 


Tix)=(Y,x'^+')il-x)-'. 
(d)    Expand  the  generating  function.  The  coefficient  of  a;"  is 

where  j  =  (n  -  k)/2 


3 

-fe-l\ 


when  n  —  A;  is  even  and  j  =  (n  —    —  l)/2  otherwise.  Thus  j  =  [{n  —  k) /2J .  Use  (—1)-'  ( 


3 

(e)  We  have  a  succession  for  (aj_i,  Oj)  if  and  only  if  bj  =  1.  Note  that  j  >  1. 

(f)  By  the  binomial  theorem  applied  to  (a;  +  (1  —  x)y)''~^,  the  coefficient  of      in  the  previous 


generating  function  is  the  given  expression. 
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(g)   The  answer  is  =  (''7^)(*fe!7)  where  t  +  2k  -  j  -  1  =  n.  Thus  t  = 

n  —  2k  +  j  +  l  and  so  the  answer  is  J^)  ("fe^^^)  •  There  are  15  subsets  the  three  subsets  1246, 
1346  and  1356  each  have  one  succession  and  the  three  subsets  1234,  2345  and  3456  each  have 
three  successions.  The  remainder  have  two  successions.  Thus,  with  fc  =  4,  we  have  Sg^i  =  3, 
S6,2  =  9,  S6,3  =  3  and  all  other  Sn.j's  are  zero,  which  agrees  with  the  formula. 

10.4.7  (a)   We  can  build  the  trees  by  taking  a  root  and  joining  to  it  zero,  one  or  two  binary  RP -trees. 

This  gives  us  T(.t)  =  x{l  +  T{x)  +  T{x)^). 

(b)  There  is  no  simple  expansion  for  the  square  root;  however,  various  things  can  be  done.  One 
possiblity  is  to  use  Vl  —  2a;  —  3x'^  =  ^/l  —  3x  y/1  +  x.  You  can  then  expand  each  square  root 
and  multiply  the  generating  functions  together.  This  leads  to  a  summation  of  about  n  terms 
for  pn-  The  terms  alternate  in  sign.  A  better  approach  is  to  write 


Vl  -  2a;  -  3a;2  =  ^  f  ''^\  {-lf{2x  +  ?,x^f  =  ^(-1)M  ^  j  T  j  2'=-^3^a;'=+^-. 


This  leads  to  a  summation  of  about  n/2  positive  terms  for  p„.  It's  also  possible  to  get  a  recursion 

by  constructing  a  first  order,  linear  differential  equation  with  polynomial  coefficients  for  T(x) 
as  done  in  Exercise  10.2.6.  Since  the  recursion  contains  only  two  terms,  it's  the  best  approach 
if  we  want  to  compute  a  table  of  values.  It's  also  the  easiest  to  program  on  a  computer. 

10.4.8.  Let  the  number  be  Qn-  We  can  build  the  trees  by  taking  a  root  and  using  nothing  or  a  binary 

RP-tree  for  each  of  its  sons.  This  gives  us  Q{x)  =  x{l  +  Q{x))'^.  Solving  the  quadratic  and  using 
qq  =  0:  Q{x)  =  (1  —  2x  —  Vl  —  4a;) /2x.  Comparing  this  with  the  generating  function  for  full  binary 
RP-trees,  we  see  that  gr„  is  the  number  of  full  binary  RP-trees  with  n  +  1  leaves  when  n  >  0. 

There  is  a  simple  bijection  that  proves  this  equality.  If  a  node  has  less  than  two  sons,  add  sons 
to  bring  the  total  to  two.  This  gives  a  full  binary  RP-tree.  The  procedure  is  clearly  reversible.  What 
happens  to  the  number  of  nodes?  All  nodes  of  the  original  tree  become  internal  nodes  in  the  full 
tree.  Since  a  full  binary  RP-tree  has  one  more  leaf  that  internal  vertex,  we  are  done. 

10.4.9.  The  key  to  working  this  problem  is  to  never  allow  the  root  to  have  exactly  one  son. 

(a)  Let  the  number  be  r„.  The  generating  function  for  those  trees  whose  root  has  degree  k  is  R{x)^ . 
Since  Efe>o^W'  =  1/(1-^(2;)).  we  have  R{x) 

—  2-  i—fi(x)  ~  •^-^(•^)-  Clearing  of  fractions  and 

solving  the  quadratic: 

,      1  +  X  -  \/l  -  2x  -  3x2 

=  — w^) — • 

(The  minus  sign  is  the  correct  choice  for  the  square  root  because  i?(0)  =  Tq  =  0.)  These 
numbers  are  closely  related  to  p„  in  Exercise  10.4.7.  By  comparing  the  equations  for  the 
generating  functions, 

(1  +  x)E(x)  =  x(P(x)  +  1) 
and  so  r„  +  r„_i  =  Pn-i  when  n  >  1. 

(b)  We  modify  the  previous  idea  to  count  by  leaves:  i?(x)  =  a;-|-X]^>2  =  x+R{x)^ /{1—R{x)). 
Solving  the  quadratic: 

^,  ,      1  +  X  -  \/l  -  6x  +  x2 

R{x)  =  . 

(c)  Prom  (a)  we  have  2(1  +  x)R  —  1  —  x  =  — \/l  —  2x  —  Sx^  and  so 

2xR'  +  2i?  -  1  =  , 

VI  -  2x  -  3x2 
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Thus  (1  -  2a;  -  3x'^){2xR'  +  R-1)  =  -2(1  +  x)R  +  l  +  x.  Equating  coefficients  of  a;"  gives  us 

(2n  +  l)r„  -  (4n  -  2)r„_i  -  (6n  -  9)r„_2  =  -2r„  -  2r„_i    for  n  >  3. 
Rearranging  and  checking  initial  conditions  we  have 

4nr„  +  3(2n  -  l)r„_i 

r-n+i  =  ^ —   for  n  >  2, 

2n  +  5 

with  ro  =  r2  =  0  and  ri  =  1.  You  should  be  able  to  treat  (b)  in  a  similar  manner.  The  result 
is  ro  =  0,  ri  =  r2  =  1  and,  for  n  >  2, 

_  3(2n  -  l)r„  -  (n  -  2)r„-i 
n  +  1 

10.4.10  (a)   By  the  Rules  of  Sum  and  Product  and  our  usual  recursive  method  for  building  trees, 


T{x,  y)=x  +  yf:  T{x,  yf  =  x+  ^^^^^  • 


Clearing  of  fractions  and  rearranging  gives  the  quadratic  —  (1  +  x  —  y)T  +  a;  =  0,  which  has 
the  solutions 

T{x,y)  =  \(\  +  X  -  y  ±  .Jl  -  2x  -  2y  +  {x  -  yf), 
and  ^0,0  =  0  shows  that  the  minus  sign  is  correct. 

(b)  Since  the  expression  inside  square  root  is  unchanged  when  x  and  y  are  interchanged;  U{x,  y)  = 
T{x,y)  —  ^'^^"^  satisfies  U{x,y)  =  U{y,x).  Equating  coefficients  of  a;"?/'^  gives  Un,k  =  Uk,n- 
Since  Un,k  =  tn,k  whenever  n  +  fc  >  1,  we  are  done. 

(c)  Let  /  be  the  map.  Define  /(•)  =  •.  Let  T  be  a  tree  whose  sons  are  the  roots  of  Ti, . . . ,  T^.  The 
following  picture  is  a  local  (or  recursive)  description  of  f{T). 


f 


•  m) 

10.4.11  (a)   A  tree  of  outdegree  D,  consists  of  a  root  and  some  number  d  &  D  oi  trees  of  outdegree 
D  joined  to  the  root.  Use  the  Rules  of  Sum  and  Product. 

(b)  Let  /o(a;)  =  1.  Define  /„+i(.x)  =  x  J2deD  In{xY.  We  leave  it  to  you  to  prove  by  induction  that 

fn{x)  agrees  with  Td{x)  through  terms  of  degree  n. 

(c)  Except  for  the  1  G  D  question,  this  is  handled  as  we  did  Td{x).  Why  must  we  have  1  ^  D?  You 
should  be  able  to  see  that  there  are  an  infinite  number  of  trees  with  exactly  one  leaf — construct 
a  tree  that  is  just  a  path  of  length  n  from  the  root  to  the  leaf. 

10.4.12.  We  can  place  zero  balls  in  (1,  *),  (2,  *),  (3,  *)  in  just  one  way — leave  them  empty.  We  can 
place  5  balls  there  in  five  ways,  the  placements  being 

5,0,0    3,2,0    2,0,3    1,4,0  0,2,3. 

Working  out  similar  results  for  1,  2,  3  and  4  balls,  we  get  from  the  hint  that  the  answer  is 

(1  +  a;  +  2a;^  +  3a;^  +  4a;^  +  5a;^)". 

10.4.13.  We  can  build  these  trees  up  the  way  we  built  the  full  binary  RP-trees:  join  two  trees  at  a 
root.  If  we  distinguish  right  and  left  sons,  every  case  will  be  counted  twice,  except  when  the  two  sons 
are  the  same.  Thus  B{x)  =  x  +  {B{x)^  —  E{x))/2  +  E{x),  where  E{x)  counts  the  situation  where 
both  sons  are  the  same  and  nonempty.  We  get  this  by  choosing  a  son  and  then  duplicating  it.  Thus 
each  leaf  in  the  son  is  replaced  by  two  leaves  and  so  E{x)  =  B{x'^). 
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10.4.14  (a)  There  is  a  bijection  between  the  sequences  and  2n-long  sequences  of  (1,0)  and  (0, 1) 
containing  an  equal  number  of  (1,  0)  and  (0, 1)  and  satisfying  one  more  property  to  stay  above 
y  =  X.  The  x  coordinate  is  the  number  of  (l,0)'s  used  and  the  y  coordinate,  which  must  be 
larger,  is  the  number  of  (0,  l)'s.  If  we  replace  (1,0)  with  —1  and  (0, 1)  with  +1,  the  difference 
in  coordinates  is  the  sum  of  the  +l's  and  — I's  up  to  that  point. 

(b)  Call  the  original  sequence  type  I  and  the  new  sequence  (end  values  deleted)  type  II.  We  must 
have  si  =  1  and  S2n  =  —1-  The  generating  function  for  type  I  sequences  is  x  times  the  gen- 
erating function  for  type  II  sequences  since  there  is  a  bijection  between  type  I  sequences  of 
length  2n  and  type  II  sequences  of  length  2{n  —  1).  The  partial  sums  of  a  type  II  sequence  are 
nonnegative  and  the  entire  sum  is  0.  Suppose  the  partial  sums  are  0  at  /e  =  ji,  j2, . . .  2n  —  1. 
Break  the  sequence  into  subsequences  between  Sj^  and  Sj^  +  1  for  all  t.  The  resulting  subse- 
quences all  sum  to  0  and  have  partial  sums  strictly  positive,  thus  they  are  type  I.  Hence  all 
type  II  sequences  are  the  juxtaposition  of  several  type  I  sequences.  Translating  these  ideas  into 
generating  functions  gives  the  formula. 

(c)  We  have  S  =  x/{l  —  S)  and  so  by  algebra  S  =  x  +  S^,  the  equation  for  unlabeled  full  binary 
RP-trees  by  leaves  and  unlabeled  RP -tress  by  vertices.  Thus  s„  =  6„,  which  we've  already 
computed. 

(d)  Here  is  a  recursive  construction  of  a  bijection  /:  T„  — s-  5„.  Suppose  that  T  is  an  RP-tree  and 
that  the  sons  of  the  root  are  Ti, . . . ,  in  order.  (We  allow  fc  =  0,  which  corresponds  to  T  =  •.) 
Let  /(T)  =  1,  f{Ti), . . . ,  f{Tk),  "1.  It  is  easily  seen  by  induction  on  the  number  of  vertices  that 
the  length  of  /(T)  is  twice  the  number  of  vertices  in  T.  Here  is  an  important  observation  for 

later  use.  If  f{T)  =  si, . . . ,  Sm,  then  S2  H  h  Sj  >  0,  with  equality  if  and  only  if  sj  is  the  last 

term  in  the  sequence  of  one  of  the  /(Tj)'s.  This  observation  is  easily  proved  by  induction  on 
the  length  of  the  sequence. 

We  now  indicate  how  to  prove  that  /  is  a  bijection.  One  way  to  do  this  is  to  exhibit  a 
function  g:  5„  such  that  g  =  f'^;  i.e.,  ff(/(r))  =  T  for  all  T  e  r„  and  f{giS))  =  S  for 

all  S  e  Sn-  We'll  define  g  recursively.  The  proof  that  g  —  is  then  done  by  induction.  Let 
=  Si, . . . ,  S2n  be  a  sequence  in  iS„.  If  n  =  1,  define  g{S)  =  •.  Define  io  =  1  <  Ji  <  •  •  •  <  jt  = 
2n  —  1  by  the  condition  that  this  sequence  contain  every  2  <  j  <  2n  such  that  S2  +  ■  ■  ■  +  Sj  =  0. 
Let  g{S)  be  a  tree  whose  root  has  t  sons,  the  fcth  son  being  g{sji__^+i, . . . ,  Sj^).  A  key  fact  in 
proving  g  =  f^^  by  induction  is  the  important  observation  we  previously  made.  Using  it,  one 
can  show  that  g{f{T))  constructs  a  tree  such  that  the  fcth  son  of  the  root  is  g{f{Tk)),  where 
Tfe  is  the  fcth  son  of  the  root  of  T.  That  allows  one  to  proceed  by  induction.  Similarly,  it  one 
can  use  induction  to  compute  f{g{S)). 

10.4.15  (a)  Either  the  list  consists  of  repeats  of  just  one  item  OR  it  consists  of  a  list  of  the  proper 
form  AND  a  list  of  repeats  of  one  item.  In  the  first  case  we  can  choose  the  item  in  s  ways  and 
use  it  any  number  of  times  from  1  to  k.  In  the  second  case,  we  can  choose  the  final  repeating 
item  in  only  s  —  1  ways  since  it  must  differ  from  the  item  preceding  it. 

(b)  After  a  bit  of  algebra, 

 s_  ^      sjl  -  x)/{s  -  1) 

"""^"^       l-{s-l){x  +  x'^  +  ---  +  x^)     s-l        I  -  sx  +  {s  -  l)x^+^  s-l' 

(c)  Multiplying  both  sides  of  the  formula  just  obtained  for  Ak{x)  by  1  —  sx+{s  —  l)a;'^+^  gives  the 
desired  result. 

(d)  Call  a  sequence  of  the  desired  sort  acceptable.  Add  anything  to  the  end  of  an  n-long  acceptable 
sequence.  This  gives  sa„,fc  sequences.  Each  of  these  is  either  an  acceptable  sequence  of  length 
n  +  1  or  an  (n  —  /c)-long  acceptable  sequence  followed  by  +  1  copies  of  something  different 
from  the  last  entry  in  the  (n  —  A;)-long  sequence. 


s/{s-l) 
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10.4.16  (a)   We  have 


0* 


000* 


1-x 

r.2 


X 


l-x 


1  U  11       x  +  x^ 
Z  =  000*(1U11)       ^— +  = 

1  —  X  1  —  X 

1  l-x 


Z 

0*(1U11)Z*0' 

Thus 


X^~\~X^  ^        X^ 

1  —  x 

X  +  X^  1  —  X  X  +  x"^ 


(1  -  a;)2  1  -  a;  -  a;3  -  a;4      {1  -  x){l  -  x  -  x^  -  x^) 


1  x  +  x^  1  +  x^  -      -  x"^  1  +  ,T  +  2x^  +  x^ 


l-x     {l-x){l-x-x^-x^)      {1  -  x){l  -  X  -  x'-i  -  x^)  l-x-x'-^-x^' 

(b)  The  recursion  follows  from  the  coefRcient  of  a;"  in  (1  —  a;  —  a;^  —  a;^)A(a;)  =  1  +  a;  +  2a;^  +  x^. 
The  initial  conditions  are 

ao  =  1       ai  =  2       02  =  4       as  =  6, 

which  can  be  obtained  from  the  generating  function  or  by  noting  that  the  only  sequences  of 
length  at  most  3  that  are  forbidden  are  101  and  111. 

(c)  By  partial  fractions, 

1  /    7  + 4a:     _  2  +  a;  \ 
~  5  ^l-.T-a;2      1  +  xy- 

Since  ^  =  12'k'=o{~^)'' ^'^'^ ■  you  should  be  able  to  complete  the  derivation  of  the  formula. 

(d)  We  have  F2  =  1,  -F3  =  2  and      =  3.  For  the  initial  conditions, 

Oq  =  1  and  F|  =  1 

ai  =  2         and  F2F3  =  2 

02  =  4      and       =  2^  =  4 

03  =  6  and  F3F4  =  2x3  =  6. 

The  recursion  will  be  satisfied  if 
for  n  >  0.  We  have 

Fn+lFn+2  +  FfiFn+l  +        —  +  FniPn+1  +  -^71)  =  Fn+lFn+2  +  -^^-^^+2 

=  [Fn+l  +  Fn)Fn+2  =  Fn+2 

and 

Fn+2  +  ^n+l  +  PnFn+1  =  -F'„+2  +  (Fn+l  +  Fn)Fn+l  =  F„_,_2  +  F„+2-F„+l 
=  Fn+'2{Fn+2  +  Fn+l)  =  Fn+2-Fn+3- 


90       Foundations  of  Applied  Combinatorics 


10.4.17.  We  have  1  -  3a;  +  a;^  =  (1  -  ax){l  -  bx)  where  a  =         and  b  =  ^=^.  Then 


X  _  l/{a-b)  _  l/(a-6) 
1  —  3a;  +  a;^       1  —  ax         1  —  bx 


and  so,  since  a  —  b=  \/5, 


Tn  = 


V5  • 

10.4.18  (a)  This  problem  differs  from  Example  10.17  in  one  important  respect:  The  vertex  1  in  the 
spanning  tree  need  not  have  a  nice  position  on  the  T  tree  containing  it.  In  fact,  it  can  be  any 
vertex  in  that  tree  except  0.  Therefore,  we  must  have  one  special  T-like  tree,  namely  one  with 
the  vertex  that  will  bec;ome  1  in  the  spanning  tree  specially  marked.  We  place  this  special  tree 

at  the  start  of  our  list  and  proceed  as  indicated. 

(b)  Gt'  =  J2k^x''  =  {xd/dx){xd/dx)Y,x^  =  {xdldx){x/{l-xf). 

(c)  We  have  the  generating  function 

V^r    (C\^=  ^x{l  +  x)/{l-x)^  _  x{l  +  x) 

f^^^'^     '       1-Gr        l-x/(l-a;)2    ~  (1  -  a;)((l  -  a;)2  -  a;) " 

(d)  By  partial  fractions,  the  generating  function  is 

2  -  3a;  2 
1  —  3a;  +  a;2     1  —  a; ' 
The  first  fraction  is  |  —  3  times  the  generating  function  for  r„. 

10.4.19  (a)  The  accepting  states  are  unchanged  except  that  if  the  old  start  state  was  accepting, 
both  the  old  and  new  start  states  are  accepting.  If  there  was  an  edge  from  the  old  start  state 
to  state  t  labeled  with  input  i,  then  add  an  edge  from  the  new  start  state  to  t  labeled  with  i. 
(The  old  edge  is  not  removed.)  We  can  express  this  in  terms  of  the  map  f  :  S  x  I  ^  2^  for 
the  nondeterministic  automaton.  Let  G  S"  be  the  old  start  state  and  introduce  a  new  start 
state  s„.  Let  T  =  5  U  {s„}  and  define  /*  :  T  x  /  ^  2"^  by 

7(i,i),  iftGS, 


f{so,i),    ifi  =  s„. 


(b)  Label  the  states  of  A  and  B  so  that  they  have  no  labels  in  c;oninion.  Call  their  start  states  sa 
and  sb-  Add  a  new  start  state  s„  that  has  edges  to  all  of  the  states  that  sa  and  sb  did.  In  other 
words,  /  *  (s„,  i)  is  the  union  of  i)  and  fB{sB,  i),  where  /a  and  Jb  are  the  functions  for 
A  and  B.  If  either  sa  or  sb  was  an  accepting  state,  so  is  s„;  otherwise  the  accepting  states  are 
unchanged. 

(c)  Add  the  start  state  of  S{A)  to  the  accepting  states.  (This  allows  the  machine  to  accept  the 
empty  string,  which  is  needed  since  *  means  '^zero  or  more  times.")  Run  edges  from  the  ac- 
cepting states  of  S{A)  to  those  states  that  the  start  state  of  S{A)  goes  to.  In  other  words,  if  s 
is  the  start  state, 

•^  _  /  /(^j*)'  if  ^  is  not  an  accepting  state, 

\  f{t,i)  ^  f{s,i),    if  t  is  an  accepting  state. 

(d)  From  each  accepting  state  of  A,  run  an  edge  to  each  state  to  which  the  start  state  of  B  has 
an  edge.  The  accepting  states  of  B  are  accepting  states.  If  the  start  state  of  B  is  an  accepting 
state,  then  the  accepting  states  of  A  are  also  accepting  states,  otherwise  they  are  not.  The 
start  state  is  the  start  state  of  A. 
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Figure  S.11.1  The  state  transition  digraph  for  covering  a  3  by  n  board  with  dominoes.  Each  vertex  is 
labeled  with  a  triple  that  indicates  whether  cornrnitrncnt  has  been  made  in  that  row  (C)  or  not  made  (N). 
The  start  and  end  states  are  those  with  no  commitments. 

Section  11.1 

11.1.1.  The  problem  is  to  eliminate  all  but  the  c's  from  the  recursion.  One  can  develop  a  systematic 
method  for  doing  this,  but  we  will  not  since  we  have  generating  functions  at  our  disposal.  In  this 
particular  case,  let  p„  =  /„,  +  .s„  and  note  that  p„  —  p„_i  =  2c„_i  by  (11.5).  Thus,  by  the  first  of 
(11.4),  this  result  and  the  last  of  (11.5), 

Cn+l  -  Cn    =    (2c„  +Pn  +  C„^i)  -  (2c„_i  +  Pn-1  +  Cn-2) 
=   2c„  -  C„_i  -  Cn-2  +  {Pn  -  Pn-l) 
=   2Cn  +  C„_i  -  Cn-2- 

11.1.3  (a)   Figure  S.11.1  gives  a  state  transition  digraph. 

Let  an,s  be  the  the  number  of  ways  to  take  n  steps  from  the  state  with  no  commitments  and 
end  in  a  state  s.  Let  As{x)  =  X^^ftn.s^:"-  As  in  the  text,  the  graph  lets  us  write  down  the  linked 
equations  for  the  generating  functions.  Prom  the  graph  it  can  be  seen  that  Ag  depends  only  on  the 
number  k  of  commitments  in  s.  Therefore  we  can  write  As  =  B]^.  The  linked  equations  are  then 

Bn{x)  =  x{B:i{x)+2Bi{x))  +  l 

B^{x)  =  x{B^{x)  +  B^ix)) 

B2{x)  =  xBi{x) 

S3  (a;)  =  xBq{x), 

which  can  be  solved  fairly  easily  for  Bo{x). 

(b)  Equate  coefficients  of  a;"  on  both  sides  of  (1  —  4a;^  +  x^)A{x)  =  1  —  x^. 

(c)  By  looking  at  the  dominoes  in  the  last  two  columns  of  a  board,  we  see  that  it  can  end  in  five 
mutually  exclusive  ways: 


This  shows  that  a„  equals  3a„_2  plus  whatever  is  counted  by  the  last  two  of  the  five  cases.  A 
board  of  length  n  —  2  ends  with  either  (i)  one  vertical  domino  and  one  horizontal  dominoes 
or  (ii)  three  horizontal  dominoes.  If  the  vertical  dominoes  mentioned  in  (i)  are  changed  to  the 
left  ends  of  horizontal  dominoes,  they  fit  with  the  last  two  cases  shown  above.  If  the  three 
horizontal  mentioned  in  (ii)  are  removed,  we  obtain  all  boards  of  length  n  —  4.  Thus  the  sum 
of  the  last  two  cases  in  the  picture  plus  a„_4  equals  a„_2. 


92       Foundations  of  Applied  Combinatorics 


11. lA.  There  are  3  choices  for  the  first  element  of  the  sequence.  For  fc  >  1,  there  are  2  choices  for 
the  fcth  element  since  it  must  differ  from  the  {k  —  l)st  element.  Thus  we  get  3  x  2"~^. 

11.1.5.  Call  the  start  state  a  and  let  Lij  be  the  number  of  different  single  letter  inputs  that  allow 
the  machine  to  move  from  state  i  to  state  j.  Let  a„,j  be  the  number  of  ways  to  begin  in  state  a, 
recognize  n  letters  and  end  in  state  i  and  let  Ai  =  G^(a„^i).  The  desired  generating  function  is  the 
sum  of  Ai  over  all  accepting  states.  A  linked  set  of  recursions  can  be  obtained  from  the  automaton 
that  leads  to  the  generating  function  equations 

^\  )  ''^lo  otherwise. 

3 

11.1.6.  Use  the  same  vertices  as  in  Example  11.2.  Let  hn^k  be  the  number  of  ways  to  end  at  state 
both  after  taking  n  steps  and  completing  k  dominoes.  Let  bn{y)  —  bn,kV^ ■  Make  similar  definitions 
for  c,  /  and  s.  On  each  edge,  give  the  number  of  dominoes  completed  in  going  to  the  next  vertex. 
The  sum  of  these  numbers  over  a  path  of  length  n  from  clear  to  clear  gives  the  number  of  dominoes 
on  a  particular  board.  Each  entry  in  the  array  in  Example  11.2  will  then  be  a  sum  of  monomials  y'^ 
where  d  is  the  number  of  dominoes  completed.  The  systems  equations  in  the  example  remain  valid 
if  we  replace  the  coefficients  by  the  new  table: 


clear 

first 

second 

both 

clear 

1  +  2/ 

1 

1 

1 

first 

y 

0 

y 

0 

second 

y 

y 

0 

0 

both 

y' 

0 

0 

0 

11.1.7  (a)  We  will  use  induction.  It  is  true  for  n  =  1  by  the  definition  of  m^.y  =  m^}y.  (Its 
also  true  for  n  =  0  becaiise  the  zeroth  power  of  a  matrix  is  the  identity  and  so  rnx}y  =  1  if 
X  =  y  and  0  otherwise.)  Now  suppose  that  n  >  1.  By  the  definition  of  matrix  multiplication, 

i^x^y  =  J2z  '^x^,z^^'^z,y  By  the  induction  hypothesis  and  the  definition  of  mz,y  each  term  in 
the  sum  is  the  number  of  ways  to  get  from  x  to  z  in  n  —  1  steps  times  the  number  of  ways  to 
get  from  z  to  y  in  one  step.  By  the  Rules  of  Sum  and  Product,  the  proof  is  complete. 

(b)  If  a  is  the  initial  state,  iAf"a*  =  ^  rrSa}y,  the  sum  ranging  over  all  accepting  states  y. 

(c)  By  the  previous  part,  the  desired  generating  function  is 

oo  oo  oo 

^iM"a*a;"  =  i^a;"M"a*  =  i^(a;M)"a*  =  i{I  -  xMy^aK 

n=0  n=0  71=0 


(d)   The  matrix  M  is  replaced  by  the  table  given  in  the  solution  to  the  previous  exercise. 

11.1.8  (a)   Following  the  hint,  there  are  three  vertices  called  0,  1  and  2.  There  are  six  edges:  {x,  0), 
(0,  x)  and  (1, 1)  where  a;  =  0, 1, 2.  The  starting  vertex  is  0  and  all  vertices  are  accepting  vertices. 

(b)   Let  a„,fc  be  the  number  of  ways  to  get  from  vertex  0  to  vertex  k'mn  steps.  Then  for  n  >  0 

^71,0     =     071-1,0  +  «7l-l,l  +  «7l-l,2 

^71,1    =   0^71-1,0  +  0.n-\,l 

<^7l,l     =  <^7l-l,0- 

With  Ak{x)  =  E„an,fe2;"  we  have  Aq{x)  =  x{Ao{x)  +  Ax{x)  +  A2{x))  +  1,  Ax{x)  = 
x{Aq{x)  +  Ai{x))  axid  A2{x)  =  xAq{x). 
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(c)  Manipulating  the  last  set  of  equations  we  have  y4i(a;)  =  xAo{x)/{l—x)  and  Ao(a;)  =  xAo{x){l  + 

+  x)  +  1.  Thus  Ao{x)  =  (1  -  x)/{l  -  2x  -  x"^  +  x^)  and  Ao{x)  +  Ai{x)  +  A2{x)  = 

Aoix){l  +  j^^+x). 

(d)  The  solution  is  (1, 0,0)(J  -  a:M)-i(l,  1, 1)*  where 


M 


(e)  The  roots  of  the  equation  are  roughly  2.2,  .55  and  —.8.  By  partial  fractions,  we  have  that  a„ 
is  the  closest  integer  to  a  r"  where 

r  =  2.24697969 ...    and    a=  ^^'l  "^T  ~  ^}  =  1.2204108  •  •  • . 

2r2  +  2r  -  3 

(If  you're  wondering  how  we  got  the  formula  for  a,  note  that  it  is  (1  —  rx){l  +  x  —  x^)/{l  — 
2x  —  x^  +  x^)  evaluated  at  a;  =  s  =  1/r.  By  FHopital's  Rule  from  calculus,  this  is  — r(l  +  s  — 

s2)/(-2-2s  +  3s2).) 

(f)  Let  6„^fc,s  be  the  number  of  distributions  where  we  end  up  in  state  s,  let  Bn^siv)  =  J2k  ^n,k,sy'^ 
and  let  Bs{x,  y)  =  Bn,s{y)x",  so  y  keeps  track  of  the  number  of  balls  and  x  of  the  number 
of  boxes.  Our  earlier  equations  become 

Bnfi{.y)  =  Bn-lfiiy)  +  Bn-l,l{y)  +  Bn-lfiiv) 

Bn,i{y)  =  y{Bn-i,o{y)  +  Bn-i,i{y)) 
Bn,i{y)  =  y'^Bn-i,o{y)- 

Thus  Bo{x,y)  =  x{Bo{x,y)  +  Bi{x,y)  +  B2{x,y))  +  1,  Bi{x,y)  =  xy{Bo{x,y)  +  Bi{x,y))  and 
B2{x,y)  =  xy'^Bo{x,y).  Manipulating  the  last  set  of  equations  we  have 

1  —  xy  1  ~\~  X'iP'  —  x'^y^ 

Bo{x,y)  =  -  2"^- — 3-^    and    BQ{x,y)  +  Bi{x,y)  +  B2{x,y)  =  . 

1  —  x  —  xy  —  x^y'  +  x'^y^  i-  —  xy 


Section  11.2 

11.2.1.  Theorem.  Suppose  each  structure  in  a  set  T  of  structures  can  be  constructed  from  an 
ordered  partition  {Ki,K2)  of  the  labels,  two  nonnegative  integers  £i  and  £2,  and  some  ordered  pair 
(Ti,T2)  of  structures  using  the  labels  Ki  in  Ti  and  K2  in  T2  such  that: 

(i)  The  number  of  ways  to  choose  a  T,  with  labels  Ki  and  £i  unlabeled  parts  depends  only 

on  i,  \Ki\  and  £j. 

(11)  Each  structure  T     T  arises  In  exactly  one  way  In  this  process. 

(We  allow  the  possibility  of  iCi  =  0  if  contains  structures  with  no  labels  and  likewise  for  £i  =  0.) 
It  then  follows  that 

T{x,y)  =  Ti{x,y)T2{x,y), 

where  Ti{x,y)  =  X]^o  *^n>™(^"/'^')?/™  ^i,n,m  is  the  number  of  ways  to  choose  Tj  with  labels 
n  and  k  unlabeled  parts.  Define  T{x,  y)  similarly. 
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The  proof  is  the  same  as  that  for  the  original  Rule  of  Product  except  that  there  is  a  double 

sum: 


n  m 

'  n 


tn,m    —     ^    ^  t2,n-\Ki\,m-ii    —    ^  ^  (  ^  )  il,fe,£i  i2,n-fe,m-^i 


11.2.2  (a)   This  is  nothing  more  than  a  special  case  of  the  Rule  of  Product — at  each  time  we  can 
choose  anything  from  T.  Repetitions  cannot  occur  because  of  the  labels. 

(b)  Simply  sum  the  previous  result  on  k. 

(c)  Sum  the  result  for  fc-lists  on  k. 

(')    Since  breaking  the  circular  lists  at  all  possible  places  give  all  linear  lists  exactly  once,  the  k 
long  circular  lists  have  generating  function  {Er)^ /k.  Sum  on  k. 

11.2.3  (a)    By  the  text, 

^z{n,k)y''  =  y{y+l)---{y  +  n-l). 

k 

Replacing  all  but  the  last  factor  on  the  right  hand  side  gives  us 


^z(n,A:)/  =  [Y^z{n-l,k)yA{y  +  n-l). 

L.  \     U  J 


Equate  coefficients  of  y^ . 

(b)  For  each  permutation  counted  by  z{n,  k),  look  at  the  location  of  n.  There  are  z{n  —  1,  —  1) 
ways  to  construct  permutations  with  n  in  a  cycle  by  itself.  To  construct  a  permutation  with 
n  not  in  a  cycle  by  itself,  first  construct  one  of  the  permutations  counted  by  z{n  —  l,k)  AND 
then  insert  n  into  a  cycle.  Since  there  are  j  ways  to  insert  a  number  into  a  j-cycle,  the  number 
of  ways  to  insert  n  is  the  sum  of  the  cycle  lengths,  which  is  n  —  1. 

11.2.4.  The  EGF  for  the  A;*^  box  is 

E 

11.2.5  (a)    For  any  particular  letter  appearing  an  odd  number  of  times,  the  generating  function  is 

-1 


—  =   — —  with  Taylor's  theorem  and  some  work. 

n  odd 

We  must  add  1  to  this  to  allow  for  the  letter  not  being  used.  The  Rule  of  Product  is  then  used 
to  combine  the  results  for  A,  B  and  C. 

(b)    Multiplying  out  the  previous  result: 


1  +  "    ^      J     =  1  +  3(e^  -  e-")/2  +  2,{e''  -  er^'fl^  +  (e^  -  6-^)^8 

=  1  +  (3e^/2  -  3e-^/2)  +  (36^^/4  -  3/2  +  3e-2^/4)  +  (e^^/S  -  3e^/8  +  3e-^/8  -  e-^^/s) 
=  -1/2  +  (e^^/S  -  e-^^/s)  +  (36^^/4  +  e-^^/4)  +  (9e^/8  -  Qe-^/s). 

Now  compute  the  coefficients. 


Solutions  Manual  95 


11.2.6  (a)   Part  (a)  is  essentially  the  same  as  before,  with  3  replaced  by  k. 

(b)  Use  the  binomial  theorem  on  (1  +  z)''  where  z  =  {e^  —  e~^)/2. 

(c)  Wc  can  get  rid  of  the  generating  function,  but  the  result  is  messy.  We'll  show  that  a„^fc  = 
Yl  ^j,k  j"j  where  the  sum  is  over  all  1  <  j  <  fc  such  that  j  and  n  have  the  same  parity  and 

Write  sinh(a;)  =  (e^  —  e~^)/2.  We  need  the  coefficient  of  x"/n\  in 

(l  +  sinha;)'=  =  ^  l"^)  (sinhx)*. 
t=o 

Expanding  {e^  —  e~^Y  by  the  binomial  theorem  and  using  Taylor  series  for  e^,  the  coefficient 
of  x"'/n\  in  (sinha;)*  is 

2-*  E(-irf!)((*-^)-^)"- 


Note  that  i  =  I  and  i  =  t  —  I  give  the  same  term  in  the  summation  except  that  the  sign 
changes  if  f  +  n  is  odd.  Thus  the  sum  is  0  if  t  +  n  is  odd  and  twice  the  value  of  the  sum  over 
0  <  i  <  t/2  otherwise.  Using  this  fact  and  changing  the  index  of  the  first  summation  from  t  to 
j  =  t  —  2i,  we  obtain  the  desired  result. 

11.2.7.  We  saw  in  this  section  that  B{x)  =  exp(e^  —  1).  Differentiating: 

B'{x)  =  exp(e^  -  1)  (e^  -  1)'  =  B(a;)e^. 

Equating  coefficients  of  a;": 


=  E 


n!         f-^  fc!  (n  —  A:)! ' 
which  gives  the  result. 

11.2.8.  By  the  exponential  formula,  A[x)  =  exp{B{x))  where  B{x)  is  the  sum  of  x"/n\  over  odd 
n.  This  sum  is  (e^  —  e~^)/2. 

11.2.9  (a)  Let  Qn^k  be  the  number  of  graphs  with  n  vertices  and  k  components.  We  have 
T^n  k  9n,k{x'^ / 'n\)y''  =  exp(yC(x)),  by  the  Exponential  Formula.  Differentiating  with  respect 
to  y  and  setting  y  =  1  gives  us 

9exp(yC(a;)) 


dy 


=  H{x). 


(^)  'YkSn.k  is  the  number  of  ways  to  choose  an  n- vertex  graph  and  mark  a  component  of  it.  We 
can  construct  a  graph  with  a  marked  component  by  selecting  a  component  (giving  C{x))  AND 
selecting  a  graph  (giving  G{x)). 

(d)   Since  C{x)  =     —  1,  we  have  H{x)  =  (e^  —  1)  exp(e^  —  1)  and  so 


Bji—k  Bfi^ 


which  is  B{n  +  1)  —  i?„  by  the  previous  exercise. 

(e)   Since  C{x)  =  x  +  x^/2,  we  have  H{x)  =  {x  +  x'^/2)I{X),  where  I{x)  is  the  EGF  for  i„,  the 
number  of  involutions  of  n.  Thus  the  average  number  of  cycles  in  an  involution  of  n  is 

+  (2)in-2  _  n         ,  i„-l 


where  the  right  side  comes  from  the  recursion     =  in-i  -\-  {n—  l)i^_2. 
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11.2.10.  A(x)  =  y^(e  —  1)   =  T-.  Expanding  this  as  a  geometric  series  gives  6*^^/2'^+^. 

'  1  —  e^/2 

11.2.11.  Suppose  n>  1.  Since  /  is  alternating, 

•  fc  is  even; 

•  /(I), . . . ,  f{k  —  1)  is  an  alternating  permutation  of  {/(I), . . . ,  f{k  —  1)}; 

•  f{k  +  1), . . . ,  /(n)  is  an  alternating  permutation  of  {f{k  +  1), . . . ,  f{n)}. 

Thus,  an  alternating  permutation  of  n  for  n  >  1  is  built  from  an  alternating  permutation  of 
odd  length  AND  an  alternating  permutation,  such  that  the  sum  of  the  lengths  is  n  —  1.  We 
have  shown  that 

n>l  ^  ' 

and  so  A!{x)  =  B{x)A{x)  +  1.  Similarly,  B'{x)  =  B{x)B{x)  +  1. 

Separate  variables  va.  B'  =  B"^  +  1  and  use  -B(O)  =  0  to  obtain  B{x)  =  tanx.  Use  the 
integrating  factor  cos  x  for 

A'{x)  =  (tana;)A(x)  +  1 
and  the  initial  condition  ^(0)  =  1  to  obtain  A{x)  =  t&nx  +  secx. 

11.2.12  (b)    Since  S{x)  =  x{S{x)  +  1)'=,  we  take  f{u)  =  {u+  1)'=  and  g{u)  =  uin  Theorem  11.5 
and  look  at  the  coeffcient  of         in  {u  +  l)^"'/n,  obtaining  i„  —  ^(jTi)- 

11.2.13  (a)   The  square  of  a  fc-cycle  is 

•  another  cycle  of  length  A;  if  fc  is  odd; 

•  two  cycles  of  length  fc/2  if  fc  is  even. 

Using  this,  we  see  that  the  condition  is  necessary.  With  further  study,  you  should  be  able  to 
see  how  to  take  a  square  root  of  such  a  permutation. 

(b)  This  is  simply  putting  together  cycles  of  various  lengths  using  (a)  and  recalling  that  there  are 
(fc  —  1)!  fc-cycles. 

oo  ^ 

(c)  By  bisection  ^  ^  =  i  ({- ln(l  -  a;)}  -  {- ln(l  -  (-x))})  . 

k=l 
k  odd 

(d)  We  don't  know  of  an  easier  method. 

11.2.14  (a)    This  generating  function  is 

( n  ^^''")(  n  ^'^'^')  =^-p(  E  '=^^^/'=){exp(  E  ^^'^')} 

/l  1   1  /c  1   X 

k  odd  k  even  A;  odd  k  even 

=  ^^(v^i^xw^y\ 

where  the  factors  in  the  last  equality  were  obtained  by  bisection  of  series  —  ln(l  —  a;)  =  Yl  /k. 

(b)  DifTcrcntiatc  the  result  in  (a)  with  respect  to  y  and  set  y  =  \  to  get  the  EGF  for  the  number 
of  even  length  cycles  in  a  permutation  summed  over  all  permutations  of  a  given  set: 

-ln(l^x'') 
2(1 -a;) 

(c)  Let  the  answer  to  (b)  be  a„  and  let  6„  =  X]fc=i  1/^'  which  is  the  average  number  of  cycles 
in  a  permutation  of  n.  The  requested  difference  is  (6„  —  a„)  —  a„  =  6„  —  2a„.  The  sum  is  an 
approximation  to      x~^dx  with  step  size  2/n. 
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11.2.15  (a)  We'll  give  two  methods.  First,  we  use  the  Exponential  Formula  approach  in  Exam- 
ple 11.13.  When  we  add  a  root,  the  number  of  leaves  does  not  change  except  when  we  started 
with  nothing  and  ended  up  with  a  single  vertex  tree.  Correcting  for  this  exception  gives  us  the 
formula. 

Without  the  use  of  the  Exponential  Formula,  we  could  partition  the  trees  according  to 
the  degree  k  of  the  root,  treating  A;  =  0  specially  because  in  this  case  the  root  is  a  leaf: 


L{x,y)  =  xy  +  Y^L{x,yf  /k\. 


k=l 

(c)    Differentiate  the  equation  in  (a)  with  respect  to  y  and  set  y  =  1  to  obtain 

U{x)  =  xe'^^''^U{x)  +x  =  T{x)U{x)  +  x, 
where  we  have  used  the  fact  that  L[x,  1)  =  T[x)  =  xe^'^^\  Solving  for  U:  U  =  jzjf-  Differenti- 
ating T{x)  =  .re^(^)  and  solving  for  T'{x)  gives  us  T'  =  ^(T^tj  •  Thus  x'^T'  +  x  =  which 
gives  the  equation  for  U{x). 

We  know  that  f„  =  n"~^.  It  follows  from  the  equation  for  U{x)  that 

n!  (n-2)! 

Thus  Un/tn  =  n/{l  +  a;)^/^,  where  x  =         As  n  ^  oo,  a;  — >  0  and,  by  I'Hopital's  Rule 
(1  +    V-  =  exp  (^''^^)  -  exp(l)  =  e. 

11.2.16  (a)   Let  T„  be  the  set  of  all  Tvvertex  RP-trees.  The  formula  follows  from 

|T„|  =  n"-i    and     ^  h{T)  =  ^i„,fe. 

TeT„  fe 

(b)  By  the  Exponential  Formula  (Theorem  11.4  (p.  313)),  e^^^'^^  counts  forests  by  vertices  and 
sums  of  heights.  When  we  add  a  new  root,  we  must  increase  the  heights  of  each  of  the  vertices 
by  1.  Since  x  keeps  track  of  all  vertices,  we  can  do  this  simply  by  replacing  x  with  xy  and  then 
add  the  new  root. 

(c)  Differentiating  T  —  xe^  and  doing  some  algebra,  we  obtain  xT'{x)  =  i^t{x)  '  which  will  be 
useful  later.  We  have 

D{x)  =  xe^(-'i)  (a;^^^!^  +  D{x)^  by  (b) 

=  xe^^''\xT'{x)+D{x)) 

=  T{x)  {xT'{x)  +  D{x))  by  T  =  a;e^. 

^^^^^xT'{x)T{x)  _  f  T{x) 


Thus 


l-T{x)  \l-T{x) 


(d)    We  take  f{u)  —  e"  and  we  have  (/{u)  =  2(i/(l  —  u)^.  Thus  we  need  the  coefficient  of  u"  ^  in 
2ue"''/(l  —  u)^,  which  is  the  same  as  the  coefficient  of  in 

which  is  a  product  of  two  easily  expanded  functions.  This  gives  us 

2E( 

k=0  ^ 

for  n  times  the  coefficient  of  x"  in  D{x). 


2  J{n-2-k)\ 
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11.2.17  (a)   There  are  several  steps 

•  Since  5  is  a  function,  each  vertex  of  ^{g)  has  outdegree  1.  Thus  the  image  of  n-  lies  in 

•  ip  is  an  injection:  if  ip{g)  =  ^p{h),  then  {x,g{x))  ~  {x,  h{x))  for  all  a:  G  n  and  so  g  =  h. 

•  Finally     is  onto        If  {V,E)  e        for  each  a;  G  n  there  is  an  edge  {x,y)  €  E.  Define 
9{x)  =  y. 

(b)  Let's  think  in  terms  of  a  function  g  corresponding  to  the  digraph.  Let  k  G  n.  If  the  equation 

g*(fc)  has  a  solution,  then  /c  is  on  a  cycle  and  will  be  the  root  of  a  tree.  The  other  vertices  of 

the  tree  are  those  j  (z  n  for  which  g'^{j)  =  k  for  some  s. 

(c)  This  is  simply  an  application  of  Exercise  11.2.2. 

(d)  In  the  notation  of  Theorem  11.5,  T(x)  is  T{x),  f{t)  =  and  g{u)  =  -  In(l-u).  Thus  n(/„/n!) 
is  the  coefficient  of  in  e""(l  —  u)~^.  Using  the  convolution  formula  for  the  coeflicient  of  a 
product  of  power  series,  we  obtain  the  result. 

11.2.18.  Since  g{u)  =  (1  —  4'u)(l  —  3u)~^,  the  answer  is  the  coeflicient  of  u"~^  in 

2(1  -  6u) 


Which  equals 


n(l  -3m)"+3' 


All  that  remains  is  some  algebraic  manipulation. 


Section  11.3 

11.3.2.  We  give  the  terms  in  the  order  the  five  types  of  rotations  of  the  cube  were  listed  earlier:  no 

rotation,  ±90°  F,  180°  F,  180°  E  and  ±120°  V. 

(a)  ^(xP  +  6a;l  +  3a;f +  6xfa;|  +  8a;4). 

(b)  ^{xl  +  Qxl  +  Zxl  +  Qxl  +  ^xlxf). 

(c)  ^{x\  +  6x1X2  +  3xf  ±  6x1X2  +  8x3)  =  |(a;f  +  3xiX2  +  2x3).  This  result  can  also  be  obtained 
by  noting  that  the  symmetries  of  the  cube  induces  the  group  Ss  of  all  symmetries  of  {x,  y,  z}. 

(d)  The  six  labels  {±x,  ±y  and  ±2;)  associated  with  the  axes  can  be  associated  with  the  faces  of 
the  cube  associate  each  half-axis  with  the  face  it  passes  through.  Thus  the  answer  is  the  same 
as  the  one  obtained  in  the  text  for  faces  of  the  cube. 

11.3.3  (a)  A  regular  octahedron  can  be  surrounded  by  a  cube  so  that  each  vertex  of  the  octahedron 
is  the  center  of  a  face  of  the  cube.  The  center  of  the  octahedron  is  the  center  of  the  cube.  A  line 
segment  from  the  center  of  the  cube  to  a  vertex  of  the  cube  passes  through  the  center  of  the 
corresponding  face  of  the  octahedron.  A  line  segment  from  the  center  of  the  cube  to  the  center 
of  an  edge  of  the  cube  passes  through  the  center  of  the  corresponding  edge  of  the  octahedron. 

(b)  By  the  above  correspondence,  the  answer  will  be  the  same  as  the  symmetries  of  the  cube  acting 
on  the  faces  of  the  cube.  See  (11.31). 

(c)  By  the  above  correspondence  it  is  the  same  as  the  answer  for  the  edges  of  the  cube.  See  the 
previous  exercise. 
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11.3.4.  Once  we  have  our  formula,  we  can  set  all  Vi's  and  fj's  equal  to  1  to  get  the  result  for  edges 
alone.  There  are  two  types  of  axes  of  rotation  of  the  regular  tetrahedron: 

•  through  a  vertex  and  the  center  of  the  opposite  face  with  rotations  of  ±120°  giving  8 
possibilities  each  yielding  a  term  viv^fifse^  and 

•  through  the  centers  of  pairs  of  opposite  edges  with  a  rotation  of  180°  giving  3  possibilities 
each  yielding  a  term  V2f2^\e%. 

In  addition,  there  is  the  action  of  doing  nothing  (identity  rotation).  Thus  we  have 

12 


Section  11.4 

11.4.1  (a)   We  have     =  2r  +  1  and  so  r  =  1  +  a/2  and  m  =  1.  By  the  principle,  we  expect  there 
is  some  constant  A  such  that  a„  ~  ^4(1  +  v^)". 

(b)  Since  ^(a;)  =  ^J-^J!L^2,  we  have  p{x)  =  1  +  x,  q{x)  =  l-2x-x^,r  =  \/2-l  =  1/(1  +  \/2) 
and  q'{r)  =  —2\/2.  Thus  A;  =  1  and  we  have 

-2x/2r"+i         2V  ; 

(c)  We  have  1  —  2x  —       =  (1  —  ax){l  —  hx)  where  a  =  1  +  \/2  and  6=1  —  \/2.  Expanding  by 
partial  fractions: 

1  +  a;  1         ^  X 


1  —  2x  —  x'^        1  —  2x  —  x'^     1  —  2x  —  x"^ 

a  b 
a—b  a—b 


1  —  ax     1  —  bx 

1  1 

I       a—b  a—b 


1  —  ax     1  —  bx 
(2  +  x^)/(2\^)  (2-a^)/(2a^) 
1  —  ax  1  —  bx 


Thusa„  =  i(l  +  y2)"+i  +  i(l-y2)"+i. 

11.4.2.  To  apply  Principle  11.2  (p.  333),  we  look  at  m„  =  Un/nl  and  Vn  =  Vn/n\.  The  first  recursion 
becomes 

2Un-2      ,       {n  -  4)m„_3  Un-4 


Un-1  + 


n(n-l)     n(n-l)(n-2)     n(n  -  l)(n  -  2)(n  -  3) ' 


Which  is  approximately  u„  —  u„_i  for  large  n.  Since  the  root  is  r  =  1,  we  expect  m„  to  behave 
roughly  like  1.  The  same  can  be  concluded  for  w„.  Thus  we  expect  J7„  and  Vn  to  behave  roughly  like 
n!. 

11.4.3.  From  the  discussion  in  the  example,  you  can  see  that  merging  two  lists  of  lengths  i  and 
j  >  i  takes  at  least  i  comparison.  Thus  the  example  shows  that  the  number  of  comparisons  for 
merge  sorting  satisfies  T„  =  /(n)  +  T(m)  +  T(n  —  m)  where  m  =  [n/2\  and  m  <  /(n)  <  n.  Apply 
Principle  11.3. 
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11.4.4  (a)    Every  cycle  length  d  must  divide  k.  Look  at  the  cycle  containing  n,  choosing  the  other 
elements  in  the  cycle,  arranging  them  and  then  choose  the  remainder  of  the  permutation. 

(b)    Let  bn  =  a„/(n!)".  Wc  must  determine  a  so  that  the  coefficients  of  the  recursion  for  6„  are 
asymptotically  constant  and  not  all  asymptotically  0.  Since 

0-l)id-iy-an-d  ^   (n"-7d)(6„-rf(n!/n'^)"   ^  n'^-^-'^'^bn-d 
(n!)«  ^  (n!)«  d 

we  must  choose  a  so  that  d  —  1  —  ad  <  0  for  all  d\k,  with  equality  for  at  least  one  d.  Solving 
the  inequality  for  a,  we  find  that  a  >  Since  d  <  k,  we  must  set  a  =  The  recursion 
becomes  ~  bn-u/k.  If  this  were  equality,  we  would  have  =  C/fc"/''.  Thus  a„  should  grow 
like  {n\)^'^~^^/'^ /k^/^ ,  which  is  roughly  like  n\{e/nkY/^ . 

11.4.5.  We'll  use  Principle  11.4  (p.  337)  so  tn,k  will  denote  the  fc*^  term  of  the  sum  we're  given. 

(a)  Since 

tn,k+i   _  n-k 

is  less  than  1  and  is  close  to  1  when  k/n  is  small,  we'll  use  Principle  11.5  (p.  337).  Since 

1  —  rfc        1  —  (n  —  fc)/n  1 
k  k  n 

(11.38)  gives  the  estimate 


2 

(b)   This  is  a  bit  more  complicated  than  (a)  since 

tn,k+i  _  k+l  n-k 

in,k  k  Ti 

is  greater  than  1  for  small  k  and  less  than  1  for  k  near  n.  Thus  tn,k  achieves  its  maximum 
somewhere  between  1  and  n,  namely,  when  the  above  ratio  equals  1.  This  leads  to  a  quadratic 
equation  for  k  which  has  the  solution 

-1  +  ^1  +  4n 
"  -  2  ■ 

Since  this  differs  from  ^/n  by  at  most  a  constant,  we'll  split  the  sum  into  two  pieces  at  A:  =  ^/ri. 
and  use  Principle  11.5  (p.  337)  for  each  half.  Since  each  half  has  the  same  estimate,  we  simply 
double  one  result.  Ignoring  the  fact  that  \/n  is  not  an  integer,  we  set  k  =  y/n  +  j  and  use  j  >0 
as  the  new  index  of  summation.  Call  the  new  terms  t'^  j.  We  have 

^    _  _  tn,k+i  _  k  +  l  n-k 

t'nj  tn,k  k  n 

y/n  +  j  +  1  n-  y/n-  j 


=  1 


1  + 


=  1- 


n 

1 

f 

\Jn  - 

^jj  \           n  J 

/n  +  jY  -  (Vn  +  j) 

niy^/n  +  j) 

+  j2  +       +  j 

n 

l-'^  =  l-2,/n. 
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Thus  {l  —  rj)/j  «  2/n  and  so  we  obtain  the  following  approximation  (the  factor  of  2  is  due  to 
the  presence  of  two  sums) 

2v/;^4,o  =  —/^  pry, 

where  (n  —  ^/n)\  should  be  approximated  using  Stirling's  formula  (Theorem  1.5  (p.9))  since 
we  have  no  formula  for  a;!  when  x  is  not  a  positive  integer. 

11.4.6  (a)  A  partition  consisting  of  a  single  block  has  EGF  cf  —  1  since  there  is  just  one  such 
partition  for  each  n  >  0.  Thus,  the  generating  function  of  a  list  of  k  blocks  is  (e^  —  I)''. 
Summing  on  k  gives  the  result. 

(b)  In  the  notation  of  Principle  11.6  (p.  341),  the  denominator  vanishes  at  r  =  In 2  so  we  try 
f{x)  =  (1  —  a;/ln2)~^.  Using  some  algebra  and  I'Hopital's  Rule, 

A(x)  l-a;/ln2 
lim     „ ,  .    =     lim  —  

x^ln2  /(a;)  x^ln2  2-6^ 

-l/ln2  1 

=     lim    =   . 

x-^in2    -e^  2  In  2 

Remembering  that  we  have  an  EGF,  (11.43)  gives 
a„        1  /n-(-l)-l 


(lnn)"(ln2)-" 


n!      21n2  V         n        J\      '  ^     '  2(ln2)»+i' 

(c)    Since  A{x)  —  \{1  —     /2)~^  =  |  ^(l/2)'^e'^'^,  the  summation  for  a„  follows  easily.  Using 
1  +  a;  Ri     for  small  x,  we  have 

tr^^  =  (^±11^  ^  i(l  +  l/fc)"  ^  e"/V2- 

Thus  the  maximum  term  occurs  at  about  k  =  n/ln2.  Split  into  two  sums  at  this  point  and 
use  Principle  11.5  (p.  337).  Both  sums  will  be  asymptotically  equal.  Let  A;  =  n/ln2  +  j. 

i'nj+l    _    if..  1  V  1  { 

-j  In  2 


=  exp  T- — -  —  In  2     =  exp 


^j  +  n/ln2  J  \n/ln2  +  j 

~  exp(-j(ln2)Vn)  ~  l-j(ln2)Vn. 

Thus  (1  —  rj)/j  ~  (ln2)^/n  and  so  we  obtain 

„  /  ^TTT^  V27m(n/ln2)"  1  V27mn" 

a„~2v/7rn/2(ln2)2t„,„/,„,  =  =  2(ln2)"+i 

This  differs  from  the  first  estimate  by  an  application  of  Stirling's  formula  for  n\. 
11.4.7.  Use  Principle  11.6  (p.  341)  with  r  =  1,  6  =  0  and  c  =  — 1  to  obtain 

^  kes 
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11.4.8  (a)  The  number  of  undirected  fc-cycles  on  is  1  if  fc  =  1  or  fc  =  2  and  is  (fc  —  l)!/2  if  A;  >  2 
because  there  are  (k  —  1)!  directed  fc-cycles  and  each  undirected  cycle  gives  rise  to  two  directed 
ones  if  >  2.  However,  the  1-cycles  and  2-cycles  are  not  allowed  in  a  simple  graph  since  a 
1-cycle  gives  a  loop  and  a  2-cycle  gives  a  double  edge.  Hence  the  number  is  (—1)1/2  for  fc  >  2 
and  0  for    <  2.  Use  the  Exponential  Formula  (Theorem  11.4  (p.  313)). 

(b)   We  use  Principle  11.6  (p.  341)  with  r  =  1,  6  =  0  and  c  =  -1/2.  Since  L  =  exp(- 1  -  j)  =  6"^/^, 

a„  ~  n!e-3/4n-i/Vr(l/2)  = 


„3/4 


nn 


11.4.9  (a)  The  functions  have  singularities  at  a;  =  — 1  as  well  as  at  a;  =  1,  so  we  can't  use 
Principle  11.6  (p.  341).  Another  reason  we  can't  use  it  for  Ae{x)  is  that  ae,n  =  0  whenever  n  is 
odd. 

(b)  For  both  cases,  r  =  1,  6  =  0  and  c  =  —1/2.  We  obtain  L  =       for  Ao{x)  and  L  =  l/\/2  for 

Aeix). 

(c)  By  power  series,  ae,2n  =  (~1)"(~^^^)(2'^)!,  which  can  be  rearranged  to  give  the  answer.  By 
Stirling's  formula,  ae,2n  ~  {2n)\/^/Tm  ~  2(2n/e)^". 

(d)  Since  Ao{x)  —  (l  +  x)Ae(x),  we  have  2n  —  de  2n  and  Uo  2n+i  —  (2n+  l)ae  2n-  By  the  previous 
part,  ao,2n  =       (2n)!  4""  and  ao,2n+i  =       (2n  +  1)!  4"". 

11.4.10  (a)    This  follows  easily  from  Exercise  10.4.1(b). 

(b)  Since  S  is  finite,  '^^es  ^  polynomial  and  so  we  have  a  rational  generating  function.  We 
can  use  Example  11.28. 

(c)  Again,  the  generating  function  is  rational,  but  it  is  not  so  obvious: 

11  1  1-x 


11.4.11.  We  use  Principle  11.6  (p.  341)  with 

A{x)  =  (l-2a:-3x2)-i/2,   6  =  0  and  c=-l/2. 
Since  l-2x-3x^  =  {1-  3x){l  +  x),r  =  1/3  and 

(1  -  2a;  -  3x2)-V2  i  ^ 

L  =    lim   ;  r — 777,        =    lim      ,  =  — . 

x^i/3      (l-3a;)-V2  3,^1/3  yTTx  2 

Thus 

\/3  3"n-V2  _  3"+V2 
2r(l/2)      ~  2V^" 

11.4.12.  For  n  >  0,  all  that  matters  is  — Vl  —  2a;  —  3a;2  /2a;.  This  can  be  dealt  with  much  like  the 

generating  function  in  the  previoTis  problem. 

11.4.13.  We  use  Principle  11.6  (p.  341)  with 
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The  square  root  vanishes  at  a;  =  r  =  0.295597742 . . .  and  so 


-v/(l  +  a;2)2-4a;  1  (1  -  x2)2  -  4x 

L  =  lim   ,    =  —  -  W  lim   ;  -.  . 

x^r      2-s/l-x/r  2^x^r  1-x/r 

By  using  I'HopitaFs  Rule,  we  obtain 

L  =  =  -Vr  +  r2(l-r2)  =  -0.61265067. 

2  y         — l/r 

11.4.14.  We  give  two  methods  for  obtaining  the  generating  function. 

The  generating  function  for  one  cycle  is  "^{n  —  =  —  ln(l  —  x).  An  ordered  fc-list  is 

obtained  by  taking  the  fc**^  power  and  it  is  unordered  by  dividing  by  k\  since  labeled  objects  will 
always  have  different  labels. 

Use  the  Exponential  Formula  (Theorem  11.4  (p.  313))  with  a  second  variable  keeping  track  of 
the  number  of  cycles  to  get  exp(— y  ln(l  —  x)).  Extract  the  coefficient  of  y''. 

For  the  asymptotics,  use  Principle  11.6  (p.  341)  with  r  =  1,  b  =  k  and  c  =  0.  Then  L  =  l/k\ 

and 

n!  A;(lnn)'=-i        (n  -  1)!  (lnn)'=-i 


fc!       n         "         (fc  - 1)! 
11.4.15.  By  techniques  we  have  used  before, 

H{x)  =  xY,H{x)''+x. 

k>2 

Sum  the  geometric  series  and  use  algebra  to  obtain  the  desired  quadratic  equation  for  H{x). 

This  quadratic  could  be  treated  as  an  implicit  equation  for  H{x)  and  we  could  apply  Princi- 
ple 11.7  (p.  345).  Alternatively,  we  could  solve  the  quadratic  for  H{x)  and  use  Principle  11.6  (p.  341). 
For  the  former,  let  F{x,  y)  =     —  y  +        For  the  latter, 

Hix)  =  + 

So  we  take  A[x)  =  — -\/r^^4ay(]r+^/2,  r  =  1/3,  6  =  0  and  c  =  1/2.  In  any  case,  the  answer  is 

\/3  3" 

an. 


11.4.16.  We  use  Principle  11.7  (p.  345)  with 

F{x,y)  =  {1  +  x)y  —  xe''^ ,    Fy{x,y)  =  l  +  x  —  xe"    and    y  =  H(x). 
From  F{r,  s)  ~  0  =  Fy{r,  s),  we  see  that  s  =  1  and  so  1  +  r  —  re  =  0,  which  gives  r  =  Easily 


Fx{r,s)  =  s  — e*  =  l  — e    and    Fyy{r,s)  =  —re* 


— e 
e-1' 


Thus  hn/n\  ~  (e  -  l)"+i/VV2^ren3. 
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11.4.17.  We  would  like  to  use  Principle  11.6  (p.  341),  but  there  is  a  problem  with  all  of  the  principles 
about  when  we  can  use  them.  At  any  rate,  we  want  to  solve 

^rVfc!  =  1 

for  r.  This  can  always  be  done  since  the  sum  vanishes  at  r  =  0  and  goes  to  +oo  as  r  — >  +oo. 

Let  d  =  gcd(_D).  You  should  be  able  to  see  that  a„  =  0  when  n  is  not  a  multiple  of  d.  Hence  we'll 
need  to  assume  that  d  =  1.  (Actually  you  can  get  around  this  by  setting  x"^  =  z,  a,  new  variable.) 

It  turns  out  that  when  all  this  is  done,  (11.43)  gives  the  correct  answer,  where  r  is  as  indicated, 
b  =  0  and  c  =  —  1.  The  answer  for  general  d  is 

d  7l\ 

an  ~   ,  ^   when  d  divides  n 

^rV(fc-l)! 

and  n„  =  0,  otherwise. 

11.4.18.  Use  Principle  11.7  (p.  345)  as  in  Example  11.33.  We  have 

F{x,y)  =  l-y  +  x{y^  +  T{x''))/2    and    Fy{x,y)  =  -1  +  xy. 
Thus  s  =  l/r  and,  using  this  in  F{r,  s)  =  0  followed  by  some  algebra, 

r^Tir"^)  =  1  -  2r. 

To  evaluate  xT{x)  for  particular  x,  we  first  solve  the  quadratic  to  obtain 

1  -  ^/l-2x-x^T{x^)  _  2x  +  a;2T(a;2) 

^         ~  X  ~  1  +       -  2x  -  a;2T(x2)  ■ 

This  can  be  iterated  to  express  xT{x)  in  terms  of  x  and  x'^''T{x'^'').  Since  x'^''T{x'^'')  rapidly 
approaches  1  as  fc  grows,  we  can  obtain  good  estimates  for  xT{x).  In  this  way,  we  obtain 
r  =  0.402697 ....  We  omit  the  details. 

11.4.19.  Use  Principle  11.6  (p.  341)  in  this  exercise, 
(a)   Set  f{x)  =  (1  -  x/rY^  and  note  that 

fc 


B{x)  _   /  A{x) 
W)   ~  \{l-x/ry 


as    X  r. 


(b)   We  have 

(^9{x)+p{x)y  =  p{x)''  +  J2(%i^)''-'9{xy. 

i—l  ^  ^ 

Look  at  each  term  on  the  right  side  separately.  Since  p{x)''  is  a  polynomial,  it  does  not  con- 
tribute to  the  asymptotics.  It  is  also  possible  that  gix)''  is  a  polynomial  for  some  ?'  >  1  and  so 
that  term  does  not  contribute  to  the  asymptotics  either.  For  those  that  do  contribute,  we  have 
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so  the  contribution  should  be  asymptotic  to 


n 


-ci-l  ^-n 


The  first  factor  is  independent  of  n  and  the  last  factor  is  independent  of  i  so  the  relative 
importance  of  the  terms  is  determined  by  n~'^'~^,  which  is  largest  when  i  is  as  small  as  possible; 
i.e.,  i=  1.  This  gives  the  result. 

(c)    Since  A{x)  is  a  sum  of  nonncgativc  terms,  it  is  an  increasing  function  of  x  and  so  A(x)  =  1  has 
at  most  one  positive  solution.  We  take  6  =  0,  c  =  — 1  and  f{x)  =  (1  —  x/s)~^  in  Principle  11.6. 


by  I'Hopital's  Rule. 

Suppose  that  c  <  0.  Note  that  A{fS)  =  0  and  that  A{x)  is  unbounded  as  x  ^  r  because 
A{x)/{1  —  x/rY  approaches  a  nonzero  limit.  Thus  A{x)  =  1  has  a  solution  in  (0,r). 

11.4.20.  ??? 

11.4.21.  For  Exercise  11.2.2(a),  use  Exercise  11.4.19(a,b). 

For  Exercise  11.2.2(b),  use  Exercise  11.4.19(c). 

Exercise  11.2.2(d)  can  be  done  like  Exercise  11.4.19(c)  was. 

The  only  difference  is  that,  since  we  are  dealing  with  a  logarithm.  Principle  11.6  is  used  with  6=1 
and  c  =  0  instead  of  with  6  =  0  and  c  =  —1. 

11.4.22  (b)  Let  U{x)  =  T{x)/x.  The  equation  for  U  isU  =  Y^x'^U'^ /d\.  Replacing  with  z,  we 
sec  that  this  leads  to  a  power  scries  for  U  in  powers  of  ^;  =  a;''.  Thus  the  coefficients  of  a;'"  in 
U {x)  will  be  0  when  m  is  not  a  multiple  of  k. 

(c)   We  apply  Principle  11.7  (p.  345)  with 


Then 


lim 


{l-A{x))-' 

(I-X/S)-! 


=  lim 


1  —  x/s 
1  -  A{x) 


sA'{s) 


1 


F{x,y)  =  y-xY^y^/dl    and    Fy{x,y)  =  1- x^,  v"'^ 


Using  F{r,  s)  =  0  =  Fy{r,  s)  and  some  algebra,  we  obtain 


dT^O 


Once  the  first  of  these  has  been  solved  numerically  for  s,  the  rest  of  the  calculations  are 
straightforward. 
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Appendix  A 

A.l.  A{n)  is  the  formula  1  +  3  H  h  (2n  —  1)  =      and  no  =  ni  =  1.  A(l)  is  just  1  =  1^.  To  prove 

A{n  +  1)  in  the  inductive  step,  use  A{n): 

(1  +  3  +  •  •  •  +  (2n  -  1))  +  (2n  +  1)  =     +  (2n  +  1)  =  (n  +  1)^. 

A. 2.  A{n)  is  the  equality  and  no  =  niO.  It  is  simple  to  check  that  A{Q)  is  true.  Now  for  the  inductive 
step  with  n  >  0. 

n  n—  1 

k=0  k=0 

=  n^  +  (n  -  l)(n  -  1  +  l)(2(n  -  1)  +  l)/6  by  A{n  -  1) 

=  n(n  +  1) (2n  +  l)/6  by  algebra. 

A.3.  Let  A{n)  be  Efc^i  (-1)''^^  =  (-1)""^  ELi  fc-  By  (A.l),  we  can  replace  the  right  hand  side 
of  A{n)  by  (— l)"'~-'^n(n  +  l)/2,  which  we  will  do.  It  is  easy  to  verify  .4(1).  As  usual,  the  induction 
step  uses  A{n  —  1)  to  evaluate  J2k=ii~^)''~^^^        some  algebra  to  prove  A{n)  from  this. 

What  would  have  happened  if  we  hadn't  thought  to  use  (A.l)?  The  proof  would  have  gotten 
more  complicated.  To  prove  A{n)  we  would  have  needed  to  prove  that 

ri  —  1  n 
fe=l  fe=l 

At  this  point,  we  would  have  to  prove  this  result  separately  by  induction  or  prove  in  using  (A.l). 
A.4.     Let  A{n)  be 

1  +  a;*fc  ~  1  -  a; 

K=0 

For  .4(1),  the  left  side  is  x/{l  +  x)  and  the  right  side  is 

X         2x^    _  x{l  +  x)  -  2x^  _  X 
1-x     l-x2  ~  (l-a;)(l  +  a;)  ~  1  +  a;" 

The  inductive  step  follows  the  usual  pattern  we've  established  for  sums,  but  there's  a  bit  of  algebra: 


^  l  +  x"^"   ~  ^1 


+ 


k=0  fe=0 

X        2"-ia;2""'  2"-ia;2'' 


1    "I"  1    ,  -2'»-i 
^  on-1^2"-i  l  +  x2-"'  -(l-x2"'') 


1  —  X      1  —  aj^"         1  +  X 


1-x  (1  -x2"-')(l  +  x2"-') 

^     -  2"~^a;2"~' 


1  —  a;  1  —  a;2 

A. 5.  The  claim  is  true  for  n  =  1.  For  n  +  1,  we  have 

(x"+i)'  =  (a;",x)'  =  (a;")'a;  +  (x")x'  =  (na;"-i)x  +  a;", 

where  the  last  used  the  induction  hypothesis.  Since  the  right  side  is  (n  +  l)a;",  we  are  done. 
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A. 6.  Call  the  answer  /„.  Note  that  Iq  =  /q°°  e  '^dx  =  1.  When  n  >  0,  we  can  evaluate  the  integral 
using  integration  by  parts  with  u  =      and  dv  =  e~'^dx: 


f 

Jo 


0  Jo 


nx"'       '■'dx  =  0  +  n/„_i. 


You  should  be  able  to  see  that  7„  =  n\. 

Now  for  the  proof.  The  formula  is  true  for  n  =  0  since  wc  calculated  that  /q  =  1 .  For  n  >  0  we  have 
/„  =  nin-i  =  n(n  —  1)!  =  n\.  where  wc  used  the  induction  hypothesis  to  evaluate  In-i- 

A. 7.  The  inductive  step  only  holds  for  n  >  3  because  the  claim  that  P„_i  belongs  to  both  groups 
requires  n  —  1  >  2;  however,  ^(2)  was  never  proved.  (Indeed,  if  ^(2)  is  true,  then  A{n)  is  true  for 
all  n.) 

A. 8.  A{1),  which  was  never  checked,  is  false. 

A. 9.  This  is  obviously  true  for  n  =  1.  Suppose  we  have  a  numbering  when  n  —  1  lines  have  been 
drawn.  The  n}^  line  divides  the  plane  into  two  parts,  say  A  and  B.  Assign  all  regions  in  A  the  same 
number  they  had  with  n  —  1  lines  and  reverse  the  numbering  in  B. 


Section  B.l 

B.l.l.  We'll  do  the  case  where  the  functions  need  not  be  nonnegative.  Also,  the  proofs  for  O 
properties  are  omitted  because  they  are  like  those  for  6  without  the  'A"  part  of  the  inequalities. 

Throughout  for  /  and  g  (or  /j  and  gi)  let  A  and  B  (or  Ai  and  Bi)  be  as  in  the  definition  of  O.  All 
inequalities  arc  understood  to  hold  for  some  A's  and  B's  and  all  sufficiently  large  n. 

(a)  Since  g{n)  is  6(/(n)),  \g{n)  \  <  B\f{n)\,  which  means  that  g{n)  is  6(/(n)). 

(b)  This  follows  from  the  definition  with  A  =  B  =  1. 

(c)  We  need  not  require  that  C  and  D  be  positive.  Let  A'  =  A\C/D\  and  B'  =  B\C/D\.  Then 

A'\Df{n)\  <  \Cg{n)\<B'\Df{n)\. 

(d)  We  have  il/B)\g{n)\  <  |/(n)|  <  {1/A)\g{n)\. 

(e)  This  was  done  in  the  text. 

(f)  As  noted  in  the  remark,  this  requires  that  fi  and  g^  ben  nonnegative.  With  A  =  min(Ai,  A2), 
B  =  Bi  +  B2  and  /(n)  =  max(/i(n),  /2(n))  we  have 

Af{n)  <  max(Ai/i(n),A2/2(n))  <  gi{n)  +  g2{n)  <  Bf{n). 

B.1.2.  Let  gi{n)  =  -.g2(n)  =  n  and  /i(n)  =  /2(n)  =  n.  Then  gi{n)  is  9(/i(n))  but  gi{n)  +  g2{n)  =  0 

and  max(/i(n), /2(n))  =  n. 

B.1.3.  This  is  not  true.  For  example,  n  is  O(n^),  but      is  not  0{n). 
B.1.4  (a)    Hint:  You  can  first  show  this  with  g{x)  =  x^  and  adjust  the  constants, 
(b)    Hint:  You  can  first  show  this  with  g{x)  =  x^  and  adjust  the  constants. 
B.1.5  (a)    Hint:  There  is  an  explicit  formula  for  the  sum  of  the  squares  of  integers. 

(b)  Hint:  There  is  an  explicit  formula  for  the  sum  of  the  cubes  of  integers. 

(c)  Hint:  If  you  know  calculus,  upper  and  lower  Riemann  sum  approximations  to  the  integral  of 
f{x)  =  x^l"^  can  be  used  here. 
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B.1.6  (a)   Hint:  If  you  know  calculus,  upper  and  lower  Riemann  sum  approximations  to  the  integral 
of  f{x)  =       can  be  used  here. 

(b)    Hint:  If  you  know  calculus,  upper  Riemann  sum  approximaitions  to  the  integral  of  f{x)  = 

log^(a;)  can  be  used  here. 

B.1.7  (a)   Here's  a  chart  of  values. 


5 

10 

30 

100 

300 

25 

102 

9  X  102 

10^ 

9  X  10^ 

lOOn 

5  X  10^ 

103 

3  X  10^ 

10^ 

3  X  10^ 

100(2"/i"  -  1) 

41 

102 

7  X  102 

10'^ 

108 

fastest 

A 

A,  C 

C 

A,  B 

B 

slowest 

B 

B 

B 

c 

C 

(b)  When  n  is  very  large,  B  is  fastest  and  C  is  slowest.  This  is  because,  (i)  of  two  polynomials 
the  one  with  the  lower  degree  is  eventually  faster  and  (ii)  an  exponential  function  grows  faster 
than  any  polynomial. 

B.1.8.  Need  solution 

B.1.9.  Let  p{n)  =  X;*=o  ^i"'  with  hk  >  0. 

(a)   Let  s  =  Y^^Zo        S'lid  assume  that  n  >  2s /bk-  We  have 


\p{n)-hkn''\  < 


fe-i 

i=0 


k-1  k-l 


i=0  i=0 

Thus  \p{n)\  >  bkU^  -  h^n^  12  >  (6fe/2)n'=  and  also  \p{n)\  <  bhu''  +  bkn''/2  <  (36fe/2)n*^. 

(b)  This  follows  form  (a)  of  the  theorem. 

(c)  By  applying  I'Hospital's  Rule  k  times,  we  see  that  the  limit  of  p(n)/a"  is  lim  (k\ / {logo)'') / aP , 
which  is  0. 

(d)  By  the  proof  of  the  first  part,  p{n)  <  (3fefe/2)n'^  for  all  sufficiently  large  n.  Thus  we  can  take 

C>3bk/2. 

(e)  For  p{n)  to  be  6(a'-^"''),  we  must  have  positive  constants  A  and  B  such  that  A  <  a^(")  jaP"^^  < 
B.  Taking  logarithms  gives  us  log„  A  <  p{n)  —  Cn*^  <  log„  B.  The  center  of  this  expression  is  a 

polynomial  which  is  not  constant  unless  p{n)  =  Cn^  +  D  for  some  constant  D,  the  case  which 
is  ruled  out.  Thus  p{n)  —  Cn'^  is  a  nonconstant  polynomial  and  so  is  unbounded. 

B.1.10.  We  have  a/(")/6/(")  =  (a/5)-''(")  ^  0  since  0  <  a/6  <  1  and  /(n)  ^  +00.  Also, 
=  — »  0. 

B.1.11  (a)    The  worst  time  possibility  would  be  to  run  through  the  entire  loop  because  the  "If" 
always  fails.  In  this  case  the  running  time  is  6(n).  This  actually  happens  for  the  permutation 

Ui  =  i  for  all  i. 

(b)  Let  be  the  number  of  permutations  which  have  aj_i  <  for  2  <  z  <  and  Uk  >  Ofe+i. 
(There  is  an  ambiguity  about  what  to  do  for  the  permutation  Ui  =  i  for  all  i,  but  it  contributes 
a  negligible  amount  to  the  average  running  time.)  The  "If"  statement  is  executed  k  times  for 
such  permutations.  Thus  the  average  number  of  times  the  "If"  is  executed  is  ^  kNk/n\.  If  the 
fli's  were  c;lioseii  iiidcpendeiitly  one  at  a  time  from  all  the  integers  so  that  no  adjacent  ones  are 
equal,  the  chances  that  all  the  k  inequalities  ai  <  02  <  •  •  •  <      >  a^+i  hold  would  be  (1/2)*^. 
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This  would  give  Nk/n\  =  (1/2)'^  and  then  J2T=a  kNk/n\  would  converge  by  the  "ratio  test." 
This  says  that  the  average  running  time  is  bounded  for  all  n.  Unfortunately  the  a^'s  cannot  be 
chosen  as  described  to  produce  a  permutation  of  n. 

We  need  to  determine  N^.  With  each  arbitrary  permutation  oi,  02, . . .  we  can  associate  a 
set  of  permutations  61,  62,  •  •  •  counted  by  Nk-  We'll  call  this  the  set  for  ai,  a2, . . ..  For  i  >  k+1, 
bi  =  Ui,  and  61, ... ,  6^+1  is  a  rearrangement  of  ai, . . . ,  Uk+i  to  give  a  permutation  counted  by 
Nk-  How  many  such  rearrangements  are  there?  bk+i  can  be  any  but  the  largest  of  the  a^'s 
and  the  remaining  6,;'s  must  be  the  remaining  a^'s  arranged  in  increasing  order.  Thus  there 
are  k  possibilities  and  so  the  set  for  ai,a2, . . .  has  k  elements.  Hence  the  set  associated  with 
01,02, .. .  contains  k  permutations  counted  by  Nf..  Since  there  are  n!  permutations,  we  have 
a  total  of  n\k  things  counted  by  Nk',  however,  each  permutation  61,  62,  •  •  •  counted  by  Nk  ap- 
pears in  many  sets.  In  fact  it  appears  {k  +  1)!  since  any  rearrangement  of  the  first  k+1  bi's 
gives  a  permutation  that  has  61, 62, . . .  in  its  set.  Thus  the  number  of  things  in  all  the  sets  is 
Nk{k +1)1.  Consequently  Nk  ^  1)!. 

By  the  previous  paragraphs,  the  average  number  of  times  the  "If"  is  executed  is 
^  fc^/(A;  +  1)!,  which  approaches  some  constant.  Thus  the  average  running  time  is  6(1). 

(c)  The  minimum  running  time  occurs  when  o„  >  o„+i  and  this  time  is  6(n).  By  previous  results 
the  maximum  running  time  is  also  9(n).  Thus  the  average  running  time  is  6(n). 

Section  B.3 

B.3.1  (a)    If  we  have  know  x(G'),  then  we  can  determine  if  c  colors  are  enough  by  checking  if 

c>x(G). 

(b)   We  know  that  0  <  x(G)  <  n  for  a  graph  with  n  vertices.  Ask  if  c  colors  suffice  for  c  =  0, 1, 2,  

The  least  c  for  which  the  answer  is  "yes"  is  x(G')-  Thus  the  worst  case  time  for  finding  xiG) 
is  at  most  n  times  the  worst  case  time  for  the  NP-complete  problem.  Hence  one  time  is  0  of 

a  polynomial  in  n  if  and  only  if  the  other  is. 

B.3. 2  (a)    Since  the  largest  possible  value  for  K  is  \S\^  one  can  use  the  idea  that  was  used  in  the 
previous  exercise  for  x(G). 

(b)  Since  K  bins  can  hold  at  most  KB  and  they  must  hold  ^  s,  we  have  K(^S,  B)B  >J2^- 

(c)  Let  FF{S,  B)  be  the  number  of  bins  actually  used.  Each  item  in  S  must  be  placed  in  a 
bin  and  we  must  look  into  at  most  FF{S,  B)  bins  to  do  so.  Assuming  we  associate  with 
each  bin  the  amount  of  space  remaining  (or  used)  in  the  bin,  each  look  takes  a  constant 
amount  of  time.  Thus  the  worst  case  running  time  is  0{FF{S,B)\S\).  It  is  actually  possible 
to  arrange  for  about  that  many  looks  to  be  required:  Let  the  first  FF{S,  B)  —  1  items  is  S 
have  size  B  and  the  rest  have  size  e,  a  very  small  number.  The  number  of  looks  required  is 
1  +  2  +  •  •  •  +  iFF{S,  B-1))  +  {\S\-  FF{S,  B)  +  1)FF{S,  B),  which  is  e{FF{S,  B) \S\).  The 
analysis  of  average  time  is  beyond  our  scope. 

(d)  We  can  never  have  more  than  one  bin  that  is  less  than  half  full.  Suppose  we  think  we  have 
used  the  algorithm  correctly  and  that  bins  i  and  j  >  i  are  at  most  half  full.  Since  everything 
that  fit  into  bin  j  could  have  been  placed  into  bin  i,  the  algorithm  should  have  placed  them 
there  or  in  some  earlier  bin.  Thus  we  could  not  have  ended  with  more  than  one  bin  at  most 
half  full.  It  follows  that  FF{S,  B)  —  1  of  the  bins  each  contains  items  summing  to  over  B/2. 
Thus  J2  ^  exceeds  {FF{S,  B)  —  l)B/2  and  so  an  earlier  result 


FF{S,  B)  <l  +  2Y,s/B  <1  +  2K{S,  B). 


